Perform real backoff when contending for writes from the router #18686

smarterclayton · 2018-02-20T21:17:16Z

The current route backoff mechanism works by tracking the last touch
time from other routes, but this is prone to failure and will not scale
to very large sets of routers competing for updating status.

Instead, treat the ability to write status as a lease renewal, and have
failure to write status as a cue to backoff. Each new write further
increases the lease confidence up to an interval. Treat observed writes
from other processes as a signal that the lease holder is maintaining
their lease.

This should allow route status updates to be scale free

Needs an e2e test still

@openshift/sig-networking @ramr

ramr · 2018-02-24T23:20:21Z

pkg/util/writerlease/writerlease.go

+			glog.V(4).Infof("[%s] Lease owner or electing, running %s", l.name, key)
+		}
+
+		isLeader, retry := fn()


Not sure I grok this part. So we call the work queue function (in this case to update status) - https://github.com/smarterclayton/origin/blob/b1f1f97f0eee4ae5325a05731288ece94e864f38/pkg/router/controller/status.go#L252 and its response says its the leader? Aren't we already the leader here if we got to this part.

This is a lease that is driven by work renewal. If a client observes that no work has been done within the lease window, it can compete to acquire the lease by doing work.

If we succeed (return true for the work), then we have "acquired the lease" by virtue of doing the work. The route object status itself is acting as the lease.

ramr · 2018-02-24T23:24:46Z

pkg/util/writerlease/writerlease.go

+			l.tick++
+		}
+		l.expires = nowFn().Add(l.nextBackoff())
+	}


I am going to have to look at this again ... where is the actual election done here? Are we setting this (aka doing the election) based on the status update for the route going through? That level of indirection might make it tough to understand this without some comments here.

If you are able to write to the object (because you have a delta to the current state) you take ownership of the lease. If you get a contention, or if you observe another client write to the object, you go into random exponential backoff until one of your writes succeeds (resetting you to zero backoff) or you hit the max backoff (the lease interval).

If you're in follower mode, everytime you observe another client doing real work you extend the lease.

For three router processes, the first route to be admitted would create a status attempt from each router (all in election). Which ever won would consider itself the leader - the others would go into backoff. If the lease interval expires without the followers observing any work getting done, they'll start trying to do work. If an entire backoff interval goes by and there is no work, the lease is released and they'll start competing again. 1m is completely unscientific but is roughly equivalent to our current mechanism.

smarterclayton · 2018-03-02T06:00:59Z

Note the goal is to prevent contended writes (instead of N replicas all attempting to write every status, we want to ensure that even as N increases write load doesn't increase back to the router).

smarterclayton · 2018-03-02T15:21:56Z

/retest

smarterclayton · 2018-03-04T15:22:06Z

/hold

As I was testing the various scenarios (conflicting config routers, multiple different router names, parallel) I realized that this causes a big slowdown when different routers are exposing the same route.

I'm going to make this be a general cleanup of the existing code so that the logic is more obvious and the code is shared between reject and admit (we don't update the route ingress field when a route is rejected because we don't set canonical hostname or wildcard policy), and then come back to this at a later point. And also add better tests, because it's clear that we need a harness that makes this easy to understand.

smarterclayton · 2018-03-06T05:21:33Z

Ok, this is all over but the screaming.

The actual update conflict algorithm previously was:

If we know we stored the correct value, and then receive a new incorrect value, we know we're conflicting and we should stop attempting to write
In any other case, go ahead and attempt to write.

I've clarified that flow and added some bells and whistles:

Instead of using an LRU with size 1024, we simply remember all routes and then have a periodic flush function
In the periodic flush function, if we've detected contention, write a glog Warning to the logs so the customer knows what to do or at least we can debug it
If we detect enough contention, just stop (until the next expiration window). I chose 10 contentions arbitrarily, and 1/10th the resync interval (3m is the default), but there shouldn't be any way for a human to trigger this accidentally (they'd have to manually update status) and so we detect contention much faster than before (before we would do one write for ever route, now we do up to 10 writes).
Remove some of the unclear magic values we placed into the cache (timestamp zero vs real timestamp).
Stop depending on timestamps - those were a hack to work around the LRU. Now that we have a flush function, we don't really need to check timestamps at all. We could also go to an epoch model if we wanted to in the future.

Other cleanup

I went through the existing status update code and made it consistent between admits and rejects - there is now exactly one code path
During rejections we were not resetting wildcard policy or canonical hostname - that is now fixed
I split the deeply confusing status conflict recording into its own data structure - there are much better comments in place and it should be easier to understand wth is going on in the code
Fixed up logging messages to be clearer about what is happening

The actual original impetus for this PR is still valid, but I'll come back to that in the future (deferring writes so that if we have a router scale 10, we don't do 10 writes for every status) once this gets cleaned up. It will be much easier to do with this in place.

I still need to add the e2e test scenarios:

create multiple routers attempting to update status on the same router name to different values, verify we perform no more than N router * 10 writes
create multiple router attempting to update status on different router names, verify that they are all created and updated
run a scaled out router and verify that all routers accept writes, even with conflicts

Some initial review will be helpful. Future changes to this code are at least sane and I'm more confident the code can be extended to backoff.

Take the previous direct map access and place it behind a contention tracker interface with much better comments. Add a better heuristic for detecting mass conflicts (instead of processing all N routes before giving up, stop much earlier). Remove the LRU behavior and use a simple flushed cache. Unify the code for admission and rejection and fix a bug where wildcard policy and canonical hostname weren't written in status.

smarterclayton · 2018-03-06T07:01:23Z

I had a breakthrough on the write leasing. I needed to extend the lease only when getting a "modified" route event where the current router's ingress status was the most recent (had the most recent Admitted condition last transition time). Also, I needed to enforce a minimum time for follower steps so that the last leader would always get an edge (prevents conflicts after we have a long quiet period).

Barring tests, i think this addresses the two issues I originally wanted to solve:

allow the router to be run with more than a few replicas (currently we have M*N writes, this gets us to O(N) writes)
ensure that conflict detection is solid so that a rollout of a lot of replicas (>5) doesn't blow up the router

I need to get tests in but this is ready for eyeballs. Local testing confirmed it has the desired behavior.

smarterclayton · 2018-03-07T05:05:43Z

/retest

Each router process now uses a rough "write leasing" scheme to avoid sending conflicting writes. When a router starts, it tries to write status for its ingress. If it succeeds in writing status, it considers itself to hold the lease, and if it fails it considers itself a follower and goes into exponential backoff. A single leader quickly emerges, and all other routers observe the writes and consider the leader to be extending her lease. In this fashion a large number of routers can use the route status itself as a coordination mechanism and avoid generating large numbers of meaningless writes.

smarterclayton · 2018-03-08T02:49:45Z

/retest

smarterclayton · 2018-03-09T22:50:59Z

/retest

smarterclayton · 2018-03-10T23:29:13Z

/test install

smarterclayton · 2018-03-11T02:24:13Z

I want to add one more stress test (testing random changes to routes over time) but I think this PR is ready for review as is. The GCP test fails because the image doesn't have the latest code (we don't build images on GCP) and I'll disable the test before merging and then reenable after.

smarterclayton · 2018-03-12T18:34:50Z

/test gcp

smarterclayton · 2018-03-12T19:31:52Z

/test gcp

smarterclayton · 2018-03-12T20:17:25Z

/test gcp

smarterclayton · 2018-03-12T23:27:10Z

/test gcp

smarterclayton · 2018-03-13T00:21:42Z

/test gcp

smarterclayton · 2018-03-13T00:33:34Z

/test gcp

smarterclayton · 2018-03-13T01:31:31Z

/test gcp

smarterclayton · 2018-03-13T02:32:20Z

/test gcp

knobunc

Clever code. But I think it matches the description, and the description seems to solve the problem.

/lgtm

openshift-ci-robot · 2018-03-15T14:45:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/cmd/infra/router/OWNERS~~ [knobunc,smarterclayton]
~~pkg/router/OWNERS~~ [knobunc,smarterclayton]
~~pkg/util/OWNERS~~ [smarterclayton]
~~test/extended/OWNERS~~ [knobunc,smarterclayton]
~~test/integration/OWNERS~~ [knobunc,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2018-03-16T17:38:07Z

/retest

smarterclayton · 2018-03-16T20:04:41Z

/retest

ssh flake

smarterclayton · 2018-03-17T15:58:05Z

/retest

openshift-merge-robot · 2018-03-17T18:34:59Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-merge-robot · 2018-03-17T19:08:49Z

Automatic merge from submit-queue (batch tested with PRs 18686, 18998).

openshift-ci-robot · 2018-03-17T20:12:56Z

@smarterclayton: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/gcp	`73905e4`	link	`/test gcp`
ci/openshift-jenkins/extended_conformance_install	`73905e4`	link	`/test extended_conformance_install`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot added sig/networking approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 20, 2018

openshift-ci-robot requested review from liggitt and pecameron February 20, 2018 21:17

openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 20, 2018

smarterclayton force-pushed the router_push branch 2 times, most recently from 4fdc6cc to b1f1f97 Compare February 22, 2018 07:08

ramr reviewed Feb 24, 2018

View reviewed changes

smarterclayton force-pushed the router_push branch from b1f1f97 to 054078b Compare March 2, 2018 04:14

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2018

smarterclayton force-pushed the router_push branch from 054078b to 4ae5470 Compare March 4, 2018 22:01

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 4, 2018

smarterclayton force-pushed the router_push branch from 4ae5470 to 3d59c70 Compare March 5, 2018 16:04

openshift-merge-robot added the vendor-update Touching vendor dir or related files label Mar 5, 2018

knobunc self-assigned this Mar 5, 2018

knobunc self-requested a review March 5, 2018 20:00

smarterclayton force-pushed the router_push branch from 3d59c70 to 4ad44ed Compare March 6, 2018 05:10

smarterclayton force-pushed the router_push branch from 4ad44ed to 60614d4 Compare March 6, 2018 06:51

smarterclayton added 2 commits March 5, 2018 22:57

Add a test framework for fetching urls

19e27b4

smarterclayton force-pushed the router_push branch from 60614d4 to 6ff4a9f Compare March 6, 2018 06:58

openshift-merge-robot removed the vendor-update Touching vendor dir or related files label Mar 6, 2018

smarterclayton force-pushed the router_push branch from 6ff4a9f to 9cfd001 Compare March 7, 2018 20:36

smarterclayton force-pushed the router_push branch from 82c6b10 to 162e42d Compare March 9, 2018 19:46

smarterclayton removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 10, 2018

smarterclayton force-pushed the router_push branch from 162e42d to 9705730 Compare March 10, 2018 17:27

Add a test case for conflict detection and write backoff

73905e4

smarterclayton force-pushed the router_push branch from 9705730 to 73905e4 Compare March 10, 2018 20:40

knobunc approved these changes Mar 15, 2018

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2018

openshift-merge-robot merged commit c6d8a92 into openshift:master Mar 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform real backoff when contending for writes from the router #18686

Perform real backoff when contending for writes from the router #18686

smarterclayton commented Feb 20, 2018 •

edited

Loading

ramr Feb 24, 2018

smarterclayton Feb 25, 2018

ramr Feb 24, 2018

smarterclayton Feb 25, 2018

smarterclayton commented Mar 2, 2018

smarterclayton commented Mar 2, 2018

smarterclayton commented Mar 4, 2018

smarterclayton commented Mar 6, 2018 •

edited

Loading

smarterclayton commented Mar 6, 2018

smarterclayton commented Mar 7, 2018

smarterclayton commented Mar 8, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 11, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

knobunc left a comment

openshift-ci-robot commented Mar 15, 2018

smarterclayton commented Mar 16, 2018

smarterclayton commented Mar 16, 2018

smarterclayton commented Mar 17, 2018

openshift-merge-robot commented Mar 17, 2018

openshift-merge-robot commented Mar 17, 2018

openshift-ci-robot commented Mar 17, 2018

Perform real backoff when contending for writes from the router #18686

Perform real backoff when contending for writes from the router #18686

Conversation

smarterclayton commented Feb 20, 2018 • edited Loading

ramr Feb 24, 2018

Choose a reason for hiding this comment

smarterclayton Feb 25, 2018

Choose a reason for hiding this comment

ramr Feb 24, 2018

Choose a reason for hiding this comment

smarterclayton Feb 25, 2018

Choose a reason for hiding this comment

smarterclayton commented Mar 2, 2018

smarterclayton commented Mar 2, 2018

smarterclayton commented Mar 4, 2018

smarterclayton commented Mar 6, 2018 • edited Loading

smarterclayton commented Mar 6, 2018

smarterclayton commented Mar 7, 2018

smarterclayton commented Mar 8, 2018

smarterclayton commented Mar 9, 2018

smarterclayton commented Mar 10, 2018

smarterclayton commented Mar 11, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 12, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

smarterclayton commented Mar 13, 2018

knobunc left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Mar 15, 2018

smarterclayton commented Mar 16, 2018

smarterclayton commented Mar 16, 2018

smarterclayton commented Mar 17, 2018

openshift-merge-robot commented Mar 17, 2018

openshift-merge-robot commented Mar 17, 2018

openshift-ci-robot commented Mar 17, 2018

smarterclayton commented Feb 20, 2018 •

edited

Loading

smarterclayton commented Mar 6, 2018 •

edited

Loading