Ensure router reload on initial sync #12199

marun · 2016-12-09T03:23:59Z

Previously, the router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger the commit. This could be caused by a route being rejected for any reason (e.g. specifying a host claimed by another namespace). The router could be left in its initial state until another commit-triggering event occurred (e.g. a watch event).

This PR ensures that the router will always reload after the initial sync.

Reference: bz1383663

cc: @openshift/networking, @smarterclayton

marun · 2016-12-09T03:42:14Z

[test]

marun · 2016-12-09T04:26:15Z

This needs more work. There needs to be checks for whether the sync state was processed both before and after the Handle* methods are called.

smarterclayton · 2016-12-09T04:42:54Z

Does this need to go into 1.4? On Dec 8, 2016, at 11:26 PM, Maru Newby <[email protected]> wrote: This needs more work. There needs to be checks for whether the sync state was processed both before and after the Handle* methods are called. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#12199 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p3wqzqeGoQqHyhkOI8-TLTP9jhPNks5rGNhogaJpZM4LIivV> .

knobunc · 2016-12-09T13:57:22Z

@smarterclayton I would argue it should go in 1.4 (or perhaps 1.4.1).

knobunc · 2016-12-09T13:57:51Z

@ramr @rajatchopra PTAL

marun · 2016-12-09T22:11:19Z

This still isn't done, but I think separating state changes from commit initiation is easier to reason about.

marun · 2016-12-11T08:46:21Z

I think this is ready to review.

While testing I noticed that the initial commit was triggered via the rate limiter which could delay the commit from the initial sync, so there's a fix included for that.

I also stopped explicitly setting the router's sync status in favor of having it be set on the first call to Commit. This ensures that sync status will always be accurate when commit is called.

I also noticed that status changes on route admission trigger watch events, which can result in reloads. The reload rate limiter mediates this problem to some extent, but after the reload triggered by an initial sync there will be at least one subsequent reload triggered by the watch events even if route state in the api is otherwise unchanged. Maybe more than one, depending on how many status changes are made and how long it takes for the watch events to propagate. I think this highlights the need to prevent haproxy reloads unless state has actually changed.

marun · 2016-12-11T19:07:15Z

Some test changes required.

The router stress test was previously assuming that the router had seen a route if it had a corresponding service unit. Since endpoints can result in service units, the fact that all routes were using the same host was not failing the test. This change ensures that the test uses unique hosts and properly detects whether a router has seen a route.

Previously the router would trigger the initial commit via the rate limiter. This change ensures the initial commit is triggered directly so the commit of the first sync can occur immediately.

ramr

I think you pushed whilst I was reviewing and I did notice you fixed one of the issues I saw.
We need to set r.stateChanged = true in CreateServiceUnit, DeleteServiceUnit and DeleteEndpoints.

ramr · 2016-12-12T21:33:55Z

pkg/router/controller/controller.go

-	// event handlers have the same view of sync state.
-	c.routesListConsumed = c.RoutesListConsumed()
-	c.updateLastSyncProcessed()
-
 	glog.V(4).Infof("Processing Route: %s -> %s", route.Name, route.Spec.To.Name)
 	glog.V(4).Infof("           Alias: %s", route.Spec.Host)
 	glog.V(4).Infof("           Event: %s", eventType)


Maybe move the [defer] c.lock.{Lock,UnLock} to after the Info log messages.

~~I don't think that's in scope for this PR.~~

Assuming it was in scope, why would that be a good idea? It would likely result in logs from the different handlers being interleaved rather than separated.

Why hold a lock for longer than you need to? You are blocked on IO on the log messages. Seems wasteful to do so.

The majority of the router's logging occurs in the chain of plugins whose methods are also called inside the lock. I don't know why sacrificing log coherency - an important aid to debugging - would be worth the relatively insignificant penalty involved in having this method also log inside the lock.

But I neither wrote the logging nor modified it in the PR. If you want this changed I think it should be done separately.

I'll look at cleaning up.

ramr · 2016-12-12T21:44:51Z

pkg/router/controller/controller.go

+	if c.syncing {
+		return
+	}
+	if err := c.Plugin.Commit(); err != nil {
 		utilruntime.HandleError(err)
 	}


Ooh, this is hard to read with the double negatives + state check.
Would it be simpler to say:

needsCommit := c.endpointsListConsumed && c.routesListConsumed && (c.Namespaces == nil || c.filteredByNamespace) if !needsCommit { return } // Ok, so we need a commit but check if there is a sync in progress ... if c.syncing { glog.V(4).Infof("Router sync in progress") return } c.syncing = true if err := c.Plugin.Commit(); err != nil { utilruntime.HandleError(err) } c.syncing = false glog.V(4).Infof("Router sync complete")

Also do we even need the c.syncing flag - since all commit is called/done under a lock anyway?
Seems like you can simplify it by removing the c.syncing checks (and setting/resetting it).

Sure, I can simplify the boolean statement. c.syncing enables the start/end logging of the sync. Is that not useful?

Actually, unless you want to get rid of the sync-related logging, I don't think there's much room for simplification. The code you've suggested is not logically equivalent to what I've proposed.

imho, if we are doing all this for just sync related logging it is just not worth the complexity. Change what messages we log rather than make the code more complex to suit those log messages. It makes it difficult to understand not to mention a wee bit too convoluted.

Just add the log messages around c.Plugin.Commit() - sync in progress and sync complete. This method is checking if we want to commit the route state, so either do it or don't ... there is no try!! ;^)

One of the goals of this PR was limiting knowledge about whether a sync was in progress to the controller rather than spreading it across the controller and the plugin, which I think is easier to reason about. This means the plugin no longer knows whether a sync is in progress and wouldn't be able to log about it.

That said, I agree that the logic around sync logging doesn't have to be in the commit method and I've factored it out into a separate method. Does that address your concerns?

marun · 2016-12-12T23:20:43Z

@ramr I've made the requested changes to CreateServiceUnit and DeleteServiceUnit. There was no change in behavior without those changes, but it does pay to be consistent.

Previously, the router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger a commit. One possible trigger for this condition was a route specifying a host already claimed by another namespace. The router could be left in its initial state until another commit-triggering event occurred (e.g. a watch event). This change refactors the commit handling so that the plugin no longer triggers a commit directly. Instead: - the plugin tracks whether state has changed - a commit is only attempted if state has changed - assuming a sync is not in progress, the controller will trigger a commit attempt after calling one of the relevant plugin Handle methods

Previously, if both resource queues were populated on initial sync, commit would be called before handleFirstSync could set the sync state and the initial reload might not bind ports if bindPortsOnSync was true. This change relies on the first call to Commit setting the sync state instead, relegating the handleFirstSync method to ensuring a commit on sync if either or both of the resource queues are empty after the initial sync.

ramr

LGTM

This change ensures that stateChanged is only set to true for endpoint and route removal and namespace filtering if state was changed.

marun · 2016-12-13T06:10:29Z

@ramr Thank you for the reviews!

openshift-bot · 2016-12-13T08:53:22Z

Evaluated for origin test up to e51b3ed

openshift-bot · 2016-12-13T10:18:34Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12317/) (Base Commit: eb304fd)

marun · 2016-12-13T17:23:42Z

@knobunc @rajatchopra Second review when you get a chance?

knobunc

LGTM @rajatchopra PTAL

knobunc · 2016-12-19T16:00:59Z

[merge]

marun · 2016-12-19T17:50:01Z

flake #12184 re-[merge]

openshift-bot · 2016-12-19T17:53:23Z

Evaluated for origin merge up to e51b3ed

openshift-bot · 2016-12-19T19:21:33Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12495/) (Base Commit: 3d9cf71) (Image: devenv-rhel7_5570)

marun added the component/networking label Dec 9, 2016

marun requested a review from ramr December 9, 2016 03:24

marun force-pushed the ensure-router-reload branch 2 times, most recently from 97d6958 to bfa8d87 Compare December 9, 2016 03:41

marun changed the title ~~Ensure router reload on initial sync~~ WIP Ensure router reload on initial sync Dec 9, 2016

marun force-pushed the ensure-router-reload branch from bfa8d87 to 75c6c3c Compare December 9, 2016 21:12

marun changed the title ~~WIP Ensure router reload on initial sync~~ Ensure router reload on initial sync Dec 9, 2016

marun changed the title ~~Ensure router reload on initial sync~~ WIP Ensure router reload on initial sync Dec 9, 2016

marun force-pushed the ensure-router-reload branch 3 times, most recently from 635e2e0 to 955d8b6 Compare December 9, 2016 22:10

marun force-pushed the ensure-router-reload branch 3 times, most recently from 6b062ec to c5a29a8 Compare December 11, 2016 08:23

marun changed the title ~~WIP Ensure router reload on initial sync~~ Ensure router reload on initial sync Dec 11, 2016

marun force-pushed the ensure-router-reload branch from c5a29a8 to a42f3eb Compare December 11, 2016 08:45

marun changed the title ~~Ensure router reload on initial sync~~ WIP Ensure router reload on initial sync Dec 11, 2016

marun force-pushed the ensure-router-reload branch from a42f3eb to 3ff49b0 Compare December 12, 2016 21:36

marun added 2 commits December 12, 2016 13:38

router: bypass the rate limiter for the initial commit

8da3664

Previously the router would trigger the initial commit via the rate limiter. This change ensures the initial commit is triggered directly so the commit of the first sync can occur immediately.

ramr suggested changes Dec 12, 2016

View reviewed changes

marun force-pushed the ensure-router-reload branch from 3ff49b0 to c4dfa3b Compare December 12, 2016 23:16

marun changed the title ~~WIP Ensure router reload on initial sync~~ Ensure router reload on initial sync Dec 13, 2016

marun force-pushed the ensure-router-reload branch from c182913 to 0fb7032 Compare December 13, 2016 02:05

marun added 2 commits December 12, 2016 18:06

ramr reviewed Dec 13, 2016

View reviewed changes

ramr approved these changes Dec 13, 2016

View reviewed changes

marun force-pushed the ensure-router-reload branch from 0fb7032 to 61edc96 Compare December 13, 2016 05:31

router: Minimize reloads for removal and filtering

e51b3ed

This change ensures that stateChanged is only set to true for endpoint and route removal and namespace filtering if state was changed.

marun mentioned this pull request Dec 13, 2016

Avoid reloads when route configuration hasn't changed #12242

Merged

knobunc approved these changes Dec 13, 2016

View reviewed changes

openshift-bot merged commit c68a3f8 into openshift:master Dec 19, 2016

marun deleted the ensure-router-reload branch December 19, 2016 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure router reload on initial sync #12199

Ensure router reload on initial sync #12199

marun commented Dec 9, 2016

marun commented Dec 9, 2016

marun commented Dec 9, 2016

smarterclayton commented Dec 9, 2016 via email

knobunc commented Dec 9, 2016

knobunc commented Dec 9, 2016

marun commented Dec 9, 2016

marun commented Dec 11, 2016 •

edited

Loading

marun commented Dec 11, 2016

ramr left a comment

ramr Dec 12, 2016

marun Dec 12, 2016 •

edited

Loading

ramr Dec 13, 2016

marun Dec 13, 2016

ramr Dec 13, 2016

ramr Dec 12, 2016

ramr Dec 12, 2016

marun Dec 12, 2016

marun Dec 12, 2016

ramr Dec 13, 2016

marun Dec 13, 2016

marun commented Dec 12, 2016

ramr left a comment

marun commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

marun commented Dec 13, 2016

knobunc left a comment

knobunc commented Dec 19, 2016

marun commented Dec 19, 2016

openshift-bot commented Dec 19, 2016

openshift-bot commented Dec 19, 2016 •

edited

Loading

Ensure router reload on initial sync #12199

Ensure router reload on initial sync #12199

Conversation

marun commented Dec 9, 2016

marun commented Dec 9, 2016

marun commented Dec 9, 2016

smarterclayton commented Dec 9, 2016 via email

knobunc commented Dec 9, 2016

knobunc commented Dec 9, 2016

marun commented Dec 9, 2016

marun commented Dec 11, 2016 • edited Loading

marun commented Dec 11, 2016

ramr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marun Dec 12, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marun commented Dec 12, 2016

ramr left a comment

Choose a reason for hiding this comment

marun commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

marun commented Dec 13, 2016

knobunc left a comment

Choose a reason for hiding this comment

knobunc commented Dec 19, 2016

marun commented Dec 19, 2016

openshift-bot commented Dec 19, 2016

openshift-bot commented Dec 19, 2016 • edited Loading

marun commented Dec 11, 2016 •

edited

Loading

marun Dec 12, 2016 •

edited

Loading

openshift-bot commented Dec 19, 2016 •

edited

Loading