Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure router reload on initial sync #12199

Merged
merged 5 commits into from
Dec 19, 2016

Conversation

marun
Copy link
Contributor

@marun marun commented Dec 9, 2016

Previously, the router wouldn't reload HAProxy after the initial sync if the last item of the initial list of any of the watched resources didn't reach the router to trigger the commit. This could be caused by a route being rejected for any reason (e.g. specifying a host claimed by another namespace). The router could be left in its initial state until another commit-triggering event occurred (e.g. a watch event).

This PR ensures that the router will always reload after the initial sync.

Reference: bz1383663

cc: @openshift/networking, @smarterclayton

@marun marun requested a review from ramr December 9, 2016 03:24
@marun marun force-pushed the ensure-router-reload branch 2 times, most recently from 97d6958 to bfa8d87 Compare December 9, 2016 03:41
@marun
Copy link
Contributor Author

marun commented Dec 9, 2016

[test]

@marun marun changed the title Ensure router reload on initial sync WIP Ensure router reload on initial sync Dec 9, 2016
@marun
Copy link
Contributor Author

marun commented Dec 9, 2016

This needs more work. There needs to be checks for whether the sync state was processed both before and after the Handle* methods are called.

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 9, 2016 via email

@knobunc
Copy link
Contributor

knobunc commented Dec 9, 2016

@smarterclayton I would argue it should go in 1.4 (or perhaps 1.4.1).

@knobunc
Copy link
Contributor

knobunc commented Dec 9, 2016

@ramr @rajatchopra PTAL

@marun marun force-pushed the ensure-router-reload branch from bfa8d87 to 75c6c3c Compare December 9, 2016 21:12
@marun marun changed the title WIP Ensure router reload on initial sync Ensure router reload on initial sync Dec 9, 2016
@marun marun changed the title Ensure router reload on initial sync WIP Ensure router reload on initial sync Dec 9, 2016
@marun marun force-pushed the ensure-router-reload branch 3 times, most recently from 635e2e0 to 955d8b6 Compare December 9, 2016 22:10
@marun
Copy link
Contributor Author

marun commented Dec 9, 2016

This still isn't done, but I think separating state changes from commit initiation is easier to reason about.

@marun marun force-pushed the ensure-router-reload branch 3 times, most recently from 6b062ec to c5a29a8 Compare December 11, 2016 08:23
@marun marun changed the title WIP Ensure router reload on initial sync Ensure router reload on initial sync Dec 11, 2016
@marun marun force-pushed the ensure-router-reload branch from c5a29a8 to a42f3eb Compare December 11, 2016 08:45
@marun
Copy link
Contributor Author

marun commented Dec 11, 2016

I think this is ready to review.

While testing I noticed that the initial commit was triggered via the rate limiter which could delay the commit from the initial sync, so there's a fix included for that.

I also stopped explicitly setting the router's sync status in favor of having it be set on the first call to Commit. This ensures that sync status will always be accurate when commit is called.

I also noticed that status changes on route admission trigger watch events, which can result in reloads. The reload rate limiter mediates this problem to some extent, but after the reload triggered by an initial sync there will be at least one subsequent reload triggered by the watch events even if route state in the api is otherwise unchanged. Maybe more than one, depending on how many status changes are made and how long it takes for the watch events to propagate. I think this highlights the need to prevent haproxy reloads unless state has actually changed.

@marun marun changed the title Ensure router reload on initial sync WIP Ensure router reload on initial sync Dec 11, 2016
@marun
Copy link
Contributor Author

marun commented Dec 11, 2016

Some test changes required.

@marun marun force-pushed the ensure-router-reload branch from a42f3eb to 3ff49b0 Compare December 12, 2016 21:36
The router stress test was previously assuming that the router had
seen a route if it had a corresponding service unit.  Since endpoints
can result in service units, the fact that all routes were using the
same host was not failing the test. This change ensures that the test
uses unique hosts and properly detects whether a router has seen a route.
Previously the router would trigger the initial commit via the rate
limiter.  This change ensures the initial commit is triggered directly
so the commit of the first sync can occur immediately.
Copy link
Contributor

@ramr ramr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you pushed whilst I was reviewing and I did notice you fixed one of the issues I saw.
We need to set r.stateChanged = true in CreateServiceUnit, DeleteServiceUnit and DeleteEndpoints.

// event handlers have the same view of sync state.
c.routesListConsumed = c.RoutesListConsumed()
c.updateLastSyncProcessed()

glog.V(4).Infof("Processing Route: %s -> %s", route.Name, route.Spec.To.Name)
glog.V(4).Infof(" Alias: %s", route.Spec.Host)
glog.V(4).Infof(" Event: %s", eventType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the [defer] c.lock.{Lock,UnLock} to after the Info log messages.

Copy link
Contributor Author

@marun marun Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's in scope for this PR.

Assuming it was in scope, why would that be a good idea? It would likely result in logs from the different handlers being interleaved rather than separated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hold a lock for longer than you need to? You are blocked on IO on the log messages. Seems wasteful to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of the router's logging occurs in the chain of plugins whose methods are also called inside the lock. I don't know why sacrificing log coherency - an important aid to debugging - would be worth the relatively insignificant penalty involved in having this method also log inside the lock.

But I neither wrote the logging nor modified it in the PR. If you want this changed I think it should be done separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at cleaning up.

if c.syncing {
return
}
if err := c.Plugin.Commit(); err != nil {
utilruntime.HandleError(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, this is hard to read with the double negatives + state check.
Would it be simpler to say:

    needsCommit := c.endpointsListConsumed && c.routesListConsumed &&  
                                 (c.Namespaces == nil || c.filteredByNamespace)
    if !needsCommit {
       return
    }

    //  Ok, so we need a commit but check if there is a sync in progress ... 
    if c.syncing {
        glog.V(4).Infof("Router sync in progress")
        return
    }

    c.syncing = true
    if err := c.Plugin.Commit(); err != nil {
       utilruntime.HandleError(err)
    }
    c.syncing = false
    glog.V(4).Infof("Router sync complete")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we even need the c.syncing flag - since all commit is called/done under a lock anyway?
Seems like you can simplify it by removing the c.syncing checks (and setting/resetting it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can simplify the boolean statement. c.syncing enables the start/end logging of the sync. Is that not useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, unless you want to get rid of the sync-related logging, I don't think there's much room for simplification. The code you've suggested is not logically equivalent to what I've proposed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho, if we are doing all this for just sync related logging it is just not worth the complexity. Change what messages we log rather than make the code more complex to suit those log messages. It makes it difficult to understand not to mention a wee bit too convoluted.

Just add the log messages around c.Plugin.Commit() - sync in progress and sync complete. This method is checking if we want to commit the route state, so either do it or don't ... there is no try!! ;^)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the goals of this PR was limiting knowledge about whether a sync was in progress to the controller rather than spreading it across the controller and the plugin, which I think is easier to reason about. This means the plugin no longer knows whether a sync is in progress and wouldn't be able to log about it.

That said, I agree that the logic around sync logging doesn't have to be in the commit method and I've factored it out into a separate method. Does that address your concerns?

@marun marun force-pushed the ensure-router-reload branch from 3ff49b0 to c4dfa3b Compare December 12, 2016 23:16
@marun
Copy link
Contributor Author

marun commented Dec 12, 2016

@ramr I've made the requested changes to CreateServiceUnit and DeleteServiceUnit. There was no change in behavior without those changes, but it does pay to be consistent.

@marun marun changed the title WIP Ensure router reload on initial sync Ensure router reload on initial sync Dec 13, 2016
@marun marun force-pushed the ensure-router-reload branch from c182913 to 0fb7032 Compare December 13, 2016 02:05
Previously, the router wouldn't reload HAProxy after the initial sync
if the last item of the initial list of any of the watched resources
didn't reach the router to trigger a commit.  One possible trigger for
this condition was a route specifying a host already claimed by
another namespace. The router could be left in its initial state until
another commit-triggering event occurred (e.g. a watch event).

This change refactors the commit handling so that the plugin no longer
triggers a commit directly.  Instead:

 - the plugin tracks whether state has changed
 - a commit is only attempted if state has changed
 - assuming a sync is not in progress, the controller will trigger a
   commit attempt after calling one of the relevant plugin Handle methods
 Previously, if both resource queues were populated on initial sync,
 commit would be called before handleFirstSync could set the sync
 state and the initial reload might not bind ports if bindPortsOnSync
 was true.  This change relies on the first call to Commit setting the
 sync state instead, relegating the handleFirstSync method to ensuring
 a commit on sync if either or both of the resource queues are empty
 after the initial sync.
Copy link
Contributor

@ramr ramr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marun marun force-pushed the ensure-router-reload branch from 0fb7032 to 61edc96 Compare December 13, 2016 05:31
This change ensures that stateChanged is only set to true for endpoint
and route removal and namespace filtering if state was changed.
@marun
Copy link
Contributor Author

marun commented Dec 13, 2016

@ramr Thank you for the reviews!

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to e51b3ed

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12317/) (Base Commit: eb304fd)

@marun
Copy link
Contributor Author

marun commented Dec 13, 2016

@knobunc @rajatchopra Second review when you get a chance?

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @rajatchopra PTAL

@knobunc
Copy link
Contributor

knobunc commented Dec 19, 2016

[merge]

@marun
Copy link
Contributor Author

marun commented Dec 19, 2016

flake #12184 re-[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to e51b3ed

@openshift-bot
Copy link
Contributor

openshift-bot commented Dec 19, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12495/) (Base Commit: 3d9cf71) (Image: devenv-rhel7_5570)

@openshift-bot openshift-bot merged commit c68a3f8 into openshift:master Dec 19, 2016
@marun marun deleted the ensure-router-reload branch December 19, 2016 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants