-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure router reload on initial sync #12199
Conversation
97d6958
to
bfa8d87
Compare
[test] |
This needs more work. There needs to be checks for whether the sync state was processed both before and after the |
Does this need to go into 1.4?
On Dec 8, 2016, at 11:26 PM, Maru Newby <[email protected]> wrote:
This needs more work. There needs to be checks for whether the sync state
was processed both before and after the Handle* methods are called.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12199 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p3wqzqeGoQqHyhkOI8-TLTP9jhPNks5rGNhogaJpZM4LIivV>
.
|
@smarterclayton I would argue it should go in 1.4 (or perhaps 1.4.1). |
@ramr @rajatchopra PTAL |
bfa8d87
to
75c6c3c
Compare
635e2e0
to
955d8b6
Compare
This still isn't done, but I think separating state changes from commit initiation is easier to reason about. |
6b062ec
to
c5a29a8
Compare
c5a29a8
to
a42f3eb
Compare
I think this is ready to review. While testing I noticed that the initial commit was triggered via the rate limiter which could delay the commit from the initial sync, so there's a fix included for that. I also stopped explicitly setting the router's sync status in favor of having it be set on the first call to Commit. This ensures that sync status will always be accurate when commit is called. I also noticed that status changes on route admission trigger watch events, which can result in reloads. The reload rate limiter mediates this problem to some extent, but after the reload triggered by an initial sync there will be at least one subsequent reload triggered by the watch events even if route state in the api is otherwise unchanged. Maybe more than one, depending on how many status changes are made and how long it takes for the watch events to propagate. I think this highlights the need to prevent haproxy reloads unless state has actually changed. |
Some test changes required. |
a42f3eb
to
3ff49b0
Compare
The router stress test was previously assuming that the router had seen a route if it had a corresponding service unit. Since endpoints can result in service units, the fact that all routes were using the same host was not failing the test. This change ensures that the test uses unique hosts and properly detects whether a router has seen a route.
Previously the router would trigger the initial commit via the rate limiter. This change ensures the initial commit is triggered directly so the commit of the first sync can occur immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you pushed whilst I was reviewing and I did notice you fixed one of the issues I saw.
We need to set r.stateChanged = true
in CreateServiceUnit
, DeleteServiceUnit
and DeleteEndpoints
.
// event handlers have the same view of sync state. | ||
c.routesListConsumed = c.RoutesListConsumed() | ||
c.updateLastSyncProcessed() | ||
|
||
glog.V(4).Infof("Processing Route: %s -> %s", route.Name, route.Spec.To.Name) | ||
glog.V(4).Infof(" Alias: %s", route.Spec.Host) | ||
glog.V(4).Infof(" Event: %s", eventType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move the [defer] c.lock.{Lock,UnLock}
to after the Info log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's in scope for this PR.
Assuming it was in scope, why would that be a good idea? It would likely result in logs from the different handlers being interleaved rather than separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hold a lock for longer than you need to? You are blocked on IO on the log messages. Seems wasteful to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The majority of the router's logging occurs in the chain of plugins whose methods are also called inside the lock. I don't know why sacrificing log coherency - an important aid to debugging - would be worth the relatively insignificant penalty involved in having this method also log inside the lock.
But I neither wrote the logging nor modified it in the PR. If you want this changed I think it should be done separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at cleaning up.
if c.syncing { | ||
return | ||
} | ||
if err := c.Plugin.Commit(); err != nil { | ||
utilruntime.HandleError(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, this is hard to read with the double negatives + state check.
Would it be simpler to say:
needsCommit := c.endpointsListConsumed && c.routesListConsumed &&
(c.Namespaces == nil || c.filteredByNamespace)
if !needsCommit {
return
}
// Ok, so we need a commit but check if there is a sync in progress ...
if c.syncing {
glog.V(4).Infof("Router sync in progress")
return
}
c.syncing = true
if err := c.Plugin.Commit(); err != nil {
utilruntime.HandleError(err)
}
c.syncing = false
glog.V(4).Infof("Router sync complete")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do we even need the c.syncing
flag - since all commit
is called/done under a lock anyway?
Seems like you can simplify it by removing the c.syncing
checks (and setting/resetting it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can simplify the boolean statement. c.syncing enables the start/end logging of the sync. Is that not useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, unless you want to get rid of the sync-related logging, I don't think there's much room for simplification. The code you've suggested is not logically equivalent to what I've proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho, if we are doing all this for just sync related logging it is just not worth the complexity. Change what messages we log rather than make the code more complex to suit those log messages. It makes it difficult to understand not to mention a wee bit too convoluted.
Just add the log messages around c.Plugin.Commit()
- sync in progress and sync complete. This method is checking if we want to commit the route state, so either do it or don't ... there is no try!! ;^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the goals of this PR was limiting knowledge about whether a sync was in progress to the controller rather than spreading it across the controller and the plugin, which I think is easier to reason about. This means the plugin no longer knows whether a sync is in progress and wouldn't be able to log about it.
That said, I agree that the logic around sync logging doesn't have to be in the commit
method and I've factored it out into a separate method. Does that address your concerns?
3ff49b0
to
c4dfa3b
Compare
@ramr I've made the requested changes to |
c182913
to
0fb7032
Compare
Previously, the router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger a commit. One possible trigger for this condition was a route specifying a host already claimed by another namespace. The router could be left in its initial state until another commit-triggering event occurred (e.g. a watch event). This change refactors the commit handling so that the plugin no longer triggers a commit directly. Instead: - the plugin tracks whether state has changed - a commit is only attempted if state has changed - assuming a sync is not in progress, the controller will trigger a commit attempt after calling one of the relevant plugin Handle methods
Previously, if both resource queues were populated on initial sync, commit would be called before handleFirstSync could set the sync state and the initial reload might not bind ports if bindPortsOnSync was true. This change relies on the first call to Commit setting the sync state instead, relegating the handleFirstSync method to ensuring a commit on sync if either or both of the resource queues are empty after the initial sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0fb7032
to
61edc96
Compare
This change ensures that stateChanged is only set to true for endpoint and route removal and namespace filtering if state was changed.
@ramr Thank you for the reviews! |
Evaluated for origin test up to e51b3ed |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12317/) (Base Commit: eb304fd) |
@knobunc @rajatchopra Second review when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @rajatchopra PTAL
[merge] |
flake #12184 re-[merge] |
Evaluated for origin merge up to e51b3ed |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12495/) (Base Commit: 3d9cf71) (Image: devenv-rhel7_5570) |
Previously, the router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger the commit. This could be caused by a route being rejected for any reason (e.g. specifying a host claimed by another namespace). The router could be left in its initial state until another commit-triggering event occurred (e.g. a watch event).
This PR ensures that the router will always reload after the initial sync.
Reference: bz1383663
cc: @openshift/networking, @smarterclayton