Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: Add option to bind ports only when ready #11768

Merged
merged 1 commit into from
Dec 1, 2016

Conversation

marun
Copy link
Contributor

@marun marun commented Nov 4, 2016

By default the router binds ports 80 and 443 even if no routing configuration is available. This may be desirable when haproxy is serving traffic directly to clients. However, the f5 loadbalancer will treat bound ports as indicating that a backend is ready to receive traffic. This means that router pods that are starting or restarting may be put into rotation before they are ready to serve the current route state and clients may see 503s for perfectly healthy endpoints. To avoid this possibility, this change adds an option (ROUTER_BIND_PORTS_AFTER_SYNC/--bind-ports-after-sync) to the router that ensures that ports are bound only when a router instance has synced with the api.

Reference: bug 1383663 link

cc: @openshift/networking

@marun
Copy link
Contributor Author

marun commented Nov 4, 2016

[test]

@marun marun force-pushed the bz1383663 branch 2 times, most recently from ee87525 to 64ac08b Compare November 4, 2016 06:23
@@ -85,6 +85,10 @@ listen stats :1936
mode http
# Health check monitoring uri.
monitor-uri /healthz
{{ if (and .State .ServiceUnits) }}
# This uri being configured indicates that routing configuration has been synchronized.
monitor-uri /readyz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the z's? Should we really use that? I was being somewhat facetious on the call when I said it.

@@ -86,6 +87,7 @@ func (o *TemplateRouter) Bind(flag *pflag.FlagSet) {
flag.StringVar(&o.ReloadScript, "reload", util.Env("RELOAD_SCRIPT", ""), "The path to the reload script to use")
flag.DurationVar(&o.ReloadInterval, "interval", reloadInterval(), "Controls how often router reloads are invoked. Mutiple router reload requests are coalesced for the duration of this interval since the last reload time.")
flag.BoolVar(&o.ExtendedValidation, "extended-validation", util.Env("EXTENDED_VALIDATION", "true") == "true", "If set, then an additional extended validation step is performed on all routes admitted in by this router. Defaults to true and enables the extended validation checks.")
flag.BoolVar(&o.BindOnlyWhenReady, "bind-only-when-ready", util.Env("BIND_ONLY_WHEN_READY", "") == "true", "Only bind ports when route state has been synchronized")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more recent trend has been to put ROUTER_ in the name (see openshift/openshift-docs#3090 for the complete list)

Should we do that here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming of the flag - I wouldn't say "ready" because that has overlap with our existing readiness concepts. I would prefer "--bind-ports-after-sync"

@@ -85,6 +85,10 @@ listen stats :1936
mode http
# Health check monitoring uri.
monitor-uri /healthz
{{ if (and .State .ServiceUnits) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no services in the system (or that match our shard params) will this still bind?

@knobunc
Copy link
Contributor

knobunc commented Nov 4, 2016

@smarterclayton PTAL

@knobunc
Copy link
Contributor

knobunc commented Nov 4, 2016

Flake #11775 [test]

@knobunc
Copy link
Contributor

knobunc commented Nov 4, 2016

flake #10773 [test]

@marun
Copy link
Contributor Author

marun commented Nov 4, 2016

As per an irc conversation with @knobunc readiness should denote that route state has been synchronized, even if there is no state available.

return
}
}
time.Sleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hot loop, don't do this. Sleep longer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimum of 50ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I keep forgetting that time.Sleep() takes a duration not an int. My intention was to sleep for 1s.

(c.Namespaces == nil || c.filteredByNamespace)
if syncedAtLeastOnce {
if err := c.Plugin.SetSyncedAtLeastOnce(); err != nil {
utilruntime.HandleError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invert this condition and return early.

@marun
Copy link
Contributor Author

marun commented Nov 7, 2016

I'm still working on this, coordinating readiness and reload is more complicated than I originally thought.

@marun
Copy link
Contributor Author

marun commented Nov 8, 2016

A router now becomes ready even when no items are queued for either of the watchers. Some of the integration tests are broken, though. Will investigate tomorrow.

@marun
Copy link
Contributor Author

marun commented Nov 9, 2016

It's not possible to configure multiple monitor-uri paths on a given stats port, so adding a readiness check would require opening multiple stats ports. I'm assuming that is out of scope here, so this PR no longer includes the addition of a readiness check.

@knobunc
Copy link
Contributor

knobunc commented Nov 9, 2016

@ramr @openshift/networking PTAL

By default the router binds ports 80 and 443 even if no routing
configuration is available.  This may be desirable when haproxy is
serving traffic directly to clients.  However, the f5 loadbalancer will
treat bound ports as indicating that a backend is ready to receive
traffic.  This means that router pods that are starting or restarting
may be put into rotation before they are ready to serve the current
route state and clients may see 503s for perfectly healthy backend
endpoints.  To avoid this possibility, this change adds an
option (BIND_ONLY_WHEN_READY/--bind-only-when-ready) to the router that
ensures that ports are bound only when a router instance has route and
endpoint state available.

Reference: bz1383663
@marun
Copy link
Contributor Author

marun commented Nov 9, 2016

flake #11685 [test]

@marun
Copy link
Contributor Author

marun commented Nov 11, 2016

flake #11887 re-[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 7b3e0d3

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11402/) (Base Commit: 74a2188)


synced := c.RoutesListSuccessfulAtLeastOnce() &&
c.EndpointsListSuccessfulAtLeastOnce() &&
(c.Namespaces == nil || c.filteredByNamespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the c.Namespaces == nil check needed? Wouldn't c.filteredByNamespace suffice? I see c.filteredByNamespace set in handle namespaces on no errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed. If c.Namespaces == nil then c.filteredByNamespace will never be set to true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If c.Namespaces == nil, then c.HandleNamespaces() will never be called:

https://github.com/openshift/origin/blob/master/pkg/router/controller/controller.go#L51

@ramr
Copy link
Contributor

ramr commented Nov 30, 2016

Changes LGTM - just a small question.

@knobunc
Copy link
Contributor

knobunc commented Dec 1, 2016

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 7b3e0d3

@openshift-bot
Copy link
Contributor

openshift-bot commented Dec 1, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11402/) (Base Commit: 3bab31d) (Image: devenv-rhel7_5458)

@openshift-bot openshift-bot merged commit e43ec15 into openshift:master Dec 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants