A new liveness probe for router pod #12846

louyihua · 2017-02-07T06:30:38Z

To ultimately prevent bug 1405440, a new, implemnt-independent HTTP-Get health check is introduced by this PR. This new health check is provided by the openshift-router process itself. For HAProxy-based router, the health check uses HAProxy's CLI in stats socket for liveness probe, and /healthz for readiness probe.

louyihua · 2017-02-08T02:07:24Z

@jeremyeder I think we can contiune discuss about the liveness probe of router pod here.
As you said in #12716 , you can set lower values of maxconn in frontend sections while keep a high value of maxconn in the global section to make sure that the /healthz endpoint will always have available connection resources. This is correct theoretically, but I think it is not easy to achieve in the current configuration.

As you can see, currently the haproxy will handle FIVE frontends:

two public frontends (HTTP & HTTPS)
two intermedia HTTPS frontends
the stats frontend

If we use Mg to denote the global maxconn, Mt to denote the stats maxconn, Mh and Ms to denote the public HTTP & HTTPS frontends respectively, we can easily found that the equation Mt + Mh + Ms * 2 < Mg should be satisfied. Here, Ms * 2 is a upper bound, which means that there are no passthough routes configured in this router, and each connection should be counted as two connections as each of them will pass the intermedia frontend. Although the equation is simple, the really hard part is to find the correct value of Mh and Ms, as the number of incoming traffic varies time to time. And a misconfiguration of Mh and Ms causes either resource waste or service interruption.

Furthermore, I think a liveness probe's responsibility is to check whether a process is alive or dead, and it is the readiness probe's responsibility to check whether a alive process works correctly or not -- surely the /healthz endpoint should belong to the latter one. And in current configuration, the /healthz endpoint will always return 200, as there is no failure condition configured, which means even other frontends reach their connection limit, the pod's status cannot reflect such condition.

So, rather than relying on these hard-to-decide values to keep the stats endpoint always available, it may be much better to find another way that has no relation with this limit as the pod's liveness probe, while keeping the /healthz endpoint as the readiness probe. And using this way, we can have a pretty good side-effect: by just confiuring the global's maxconn and default's maxconn, the pod's readiness state reflects whether this router reaches its connection limit or not!

smarterclayton · 2017-02-10T01:35:12Z

This probe seems way more complex. /healthz is almost always liveness, not readiness (we specifically expose other readiness probes for that).

smarterclayton · 2017-02-10T01:35:58Z

Exec is also much heavier than an http endpoint. Also, other platforms that call into the router can't easily use that check and might rely on healthz. I would much prefer a solution that makes healthz obey different connection limits.

louyihua · 2017-02-10T02:46:24Z

@smarterclayton
As the openshift platform cannot distinguish the connection timeout to /healthz is caused by not loaded or overloaded, using this endpoint for both liveness and readiness is not so suitable.
Maybe it should be the openshift-router or another agent inside the pod that provides health check endpoints rather than haproxy. In this case, this agent can check whether the haproxy's process is in a normal state or not started through other ways, then it can combine such information with the timeout result from monitor-uri returned by haproxy to figure out what caused such timeout (overloading? not started? or any other errors?).
Through this way, the router can still provides its health checks using HTTP endpoints, and it may not be affected by the connection limits.

louyihua · 2017-02-16T02:03:32Z

@smarterclayton It may be difficult to make /healthz obey different connection limit rules, as it resides in normal HAProxy's proxy rather than a special one.
I've proposed a health check that is provided by the openshift-router process. This provides more flexisibility, as it gives a consistent health check endpoint for platform & other apps, no matter what the underlying implement is used. For current HAProxy-based router, the liveness probe uses HAProxy's stats socket, while the readiness probe uses current /healthz endpoint.

jeremyeder · 2017-02-16T18:09:54Z

Perhaps the right thing to do is experiment patching haproxy to special-case /healthz and use that as a starting point for more deeper discussions with haproxy upstream.

louyihua · 2017-02-17T02:04:03Z

@jeremyeder
I've experimented on HAProxy's monitor-uri for a while, and there are what I found:

The monitor-uri (which defines /healthz) works in a per-frontend basis, which means each frontend can have its own failure conidition and so that there is NO single global health check for HAProxy.
The maxconn limit works at a VERY EARLY STAGE (actually, just after a socket is accepted/created), while the monitor-uri works at HTTP level.
Such gap makes it very difficult to do a special case for what defined in monitor-uri (like /healthz): you know, if we want to do a special case for /healthz, you must duplicate the check to many places as the /healthz is supported only in HTTP frontend while the connection limit works at every frontend. And such movement may also breaks what maxconn indicates: HAProxy deals with no more sockets than the number calculated by maxconn.
And there maybe a consideration in HAProxy's author: a success /healthz indicates the frontend is normal, a failure /healthz indicates the frontend is fail, and a failure connection to /healthz indicates the fronted exhausted its connection limit.
And there is also another consideration, if /healthz does not obey the default connection limit rule, should there be a special rule for it? If no, it will be a weak point as malicious connections can easily exhausted the server's resource by making a large number of connections to /healthz. If yes, it may introduce more complexity in configuration.
Based on the above, I'm really doubt that whether the upstream will accept such change (make /healthz not obey the connection limit rule) or not. And I keep my mind that we should not just use /healthz as the liveness probe not just because it is difficult to make it not obey the connection limit rule, but also because it is only a per-frontend check rather than a global check, while a liveness probe SHOULD be a global check of an entire pod rather than a single frontend.

smarterclayton · 2017-02-17T23:58:56Z

I'd be ok with a more complex healthz endpoint being hosted by openshift-router on its own port and use that for liveness probe. But that check needs to be flexible to many router types, not just haproxy

louyihua · 2017-02-18T14:50:24Z

My preliminary proposal contains the following changes in the updated PR :

Open an HTTP endpoint (which is configurable through a command line parameter or environment variable) that listens incoming health checks using http.ListenAndServe in router controller, which is inherited by all types of router.
Add a HandleProbe method in the interface of router plugin, so that each plugin (not just the underlying router implementation) has the ability to report the health state if necessary.
For current HAProxy router implementation, it uses the stats socket for liveness probe (this is a unix socket not affected by the maxconn limit) and the /healthz endpoint for readiness probe. For the F5 router, now it just returns OK for all probes, but more checks can be added if necessary.

jeremyeder · 2017-02-20T14:55:03Z

@jmencak @openshift/networking PTAL?

pecameron

Is there a openshift-docs PR that corresponds with these changes?

pecameron · 2017-02-20T16:14:37Z

docs/man/man1/openshift-infra-f5-router.1

+
+.PP
+\fB\-\-probe\-timeout\fP="1s"
+    The timeout that router waits for underlying implementation to reply a probe


reply to a probe (add the "to")

Thank you for correction.

pecameron · 2017-02-20T16:19:12Z

pkg/cmd/admin/router/router.go

-		if cfg.HostNetwork {
-			probe.Handler.HTTPGet.Host = "localhost"
-		}
+	// Workaround for misconfigured environments where the Node's InternalIP is


Is this information in the openshift-docs documentation? If its new what is the docs PR?

You mean the comment here? It has been here for a while and it is just an indentation change here in this PR.

pecameron · 2017-02-20T16:21:12Z

pkg/cmd/admin/router/router.go

-		probe.InitialDelaySeconds = 10
-	}
-	return probe
+	return generateProbeConfigForRouter(cfg, ports, "/alive", 10)


Is a 10 sec delay sufficient? Should this be configurable?

This is just a default thing that has been used for a while.
If not good for some users, further customization can be through editing the router's DC.

pecameron · 2017-02-20T16:22:47Z

pkg/cmd/infra/router/router.go

@@ -73,6 +77,8 @@ func (o *RouterSelection) Bind(flag *pflag.FlagSet) {
 	flag.BoolVar(&o.AllowWildcardRoutes, "allow-wildcard-routes", cmdutil.Env("ROUTER_ALLOW_WILDCARD_ROUTES", "") == "true", "Allow wildcard host names for routes")
 	flag.BoolVar(&o.DisableNamespaceOwnershipCheck, "disable-namespace-ownership-check", cmdutil.Env("ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK", "") == "true", "Disables the namespace ownership checks for a route host with different paths or for overlapping host names in the case of wildcard routes. Please be aware that if namespace ownership checks are disabled, routes in a different namespace can use this mechanism to 'steal' sub-paths for existing domains. This is only safe if route creation privileges are restricted, or if all the users can be trusted.")
 	flag.BoolVar(&o.EnableIngress, "enable-ingress", cmdutil.Env("ROUTER_ENABLE_INGRESS", "") == "true", "Enable configuration via ingress resources")
+	flag.StringVar(&o.ProbeEndpoint, "probe-endpoint", cmdutil.Env("ROUTER_PROBE_ENDPOINT", "0.0.0.0:1935"), "The http endpoint that router listens on for accepting incoming probes")
+	flag.StringVar(&o.ProbeTimeoutStr, "probe-timeout", cmdutil.Env("ROUTER_PROBE_TIMEOUT", "1s"), "The timeout that router waits for underlying implementation to reply a probe")


reply "to" a probe

eparis · 2017-02-21T01:17:09Z

I'm still not sure what I think about the whole idea. @smarterclayton can you give a minute of thought here to tell me I'm wrong?

We are going to have a probe which can pass, even when the router is not doing its job. Its job, Its one and only job, is to serve on port 80. The bug is that when the router is overworked and IS NOT SERVING on port 80 that the probe is failing.

Is the problem really the probe? If so, this PR makes sense to me.

Or is the problem how we REACT to the probe?

Is the real problem that when the router is overworked it gets killed and restarted which may only compound the problem? Should we be looking for better way so react?

Maybe the solution is somewhere in the middle... Vertical autoscaling under pressure? A probe which only checks stats is port 80 is failing, and only then if the stats tell us that it is working?

This whole issues (not just this PR) really rubs me the wrong way, but I still haven't figured out 'the right way'. I just feel sure we looking at it wrong.

rajatchopra · 2017-02-21T01:43:04Z

@eparis The problem is with the reaction to the probe. But the current probe does not tell us much - has the pod failed? No. Is the pod overworked that it cannot even handle the probe response? Yes.
How do we distinguish?

This PR does not do much to solve the reaction to the old probe response, but re-organizes the code such that we can write specific probe responses when we know what we want to do. The liveness probe is certainly better even if the readiness probe is the same old answer.

To ultimately prevent bug 1405440, a new, implemnt-independent HTTP-Get health check is introduced by this PR. This new health check is provided by the `openshift-router` process itself. For HAProxy-based router, the health check uses HAProxy's CLI in stats socket for liveness probe, and `/healthz` for readiness probe.

pecameron · 2017-02-21T15:16:18Z

@eparis what problem are we trying to fix here? Sounds like when the system gets resource constrained, things slow down, so what is the real bottleneck?

At the least we should have a test that demonstrates that the fix works. Load haproxy to long response times and verify the health probe still returns quickly enough. Changes in the name of performance need very careful testing.

louyihua · 2017-02-21T16:00:31Z

@eparis The motivation of this fix comes from the BUG 1405440, which points that: the number of active connections reaches HAProxy's global connection limit, HAProxy will also refuse to answer the /healthz endpoint, which causes router pod be repeatedly restarted because openshift use this endpoint as the liveness probe. (for a liveness probe, if it fails, what else we can do but restart the pod?) However, if we just want to solve this problem itself, we have several options:

We can raise the HAProxy's global connection limit which won't be reached easily, or
We can think a way to make this endpoint not obey the connection limit, or
We can use different probes.
Option 1 is easy, but far from good. One reason is that, although we can raise the limit to a very high number which seems unreachable even in extrem situations, the router's role cannot afford the side effect it brings: before the connection limit is exhausted, other system resource limits (memory, CPU, ...) are reached. When this happens, not only new connections cannot come in, even existing connections may be affected (such as slow responding or even packet loss). I think this is not what we want to see.
Option 2 seems good, but very hard to implement. I've investigated HAProxy's code, and find that making a HTTP endpoint not obey the connection limit rule requires break to current HAProxy's code structure. And, using /healthz as both liveness and readiness probe is not a good option, as @rajatchopra said, /healthz just returns ture or false and can't provide any information to tell what happens.
Since the above two options both have great limitations, only option 3 is left. I first tried to propose a shell probe, but such probe seems no good than /healthz. Then, I think that, why not use openshift-router to provide the probe endpoint? In this way, we can provide a general way of probe, not only to just solve the above BUG, but also we can do much more: for example, we can answer liveness and readiness probes through different criteria, which can be fully adjusted according to our requirements.

And, @pecameron
I don't think we need to make too much load to HAProxy to get it slow down. If HAProxy holds too much load, not only new incoming connections are affected, but also exsting connections may also be affected, which is not what we want. I think what we need to do is: we decide a reasonable connection limitation for HAProxy such as the limitation we set in a real production environment, than we make the number of active connections reaches such limit, so that any new incoming requests can't be served, but existing connections are not much affected. Then, in such a situation, we test whether our probe (liveness & readiness) can response quickly and correctly.

ramr · 2017-02-22T02:39:56Z

Just adding me 2 cents here. The main aim here is to check for liveness that the haproxy process (or the router pod really) is alive. So from a certain perspective, connectivity to the haproxy process does also signal liveness. Maybe the simpler approach is to have the liveness probe use TCPSocketAction and just verify connectivity to the stats port rather than try and send an HTTP request. We could still use the HTTP action/request for the readiness probe but that's at startup (or close to it) time. At steady state, the tcp check is probably enough and a wee bit less invasive.

louyihua · 2017-02-22T05:53:07Z

For HAProxy, as the connection limit is checked after socket accept in HAProxy, the TCPSocketAction should be succeeded even if connection limit is reached. So @ramr is right, using TCPSocketAction as the liveness probe will be the easist way for now.
But if we want to support other types of software router (like nginx) in future, maybe considering a more general, flexible and implement-independent probe should also be necessary.

openshift-bot · 2017-05-25T23:29:31Z

Origin Action Required: Pull request cannot be automatically merged, please rebase your branch from latest HEAD and push again

knobunc · 2017-05-31T19:39:41Z

Closing this based on the previous conversation.

louyihua changed the title ~~A new liveness probe for router pod~~ [DO NOT MERGE] A new liveness probe for router pod Feb 7, 2017

louyihua force-pushed the router-probe-fix branch from 0e1cf53 to 8e1855d Compare February 7, 2017 07:04

louyihua force-pushed the router-probe-fix branch from 8e1855d to 12d1af2 Compare February 8, 2017 02:23

pweil- assigned jeremyeder Feb 8, 2017

pweil- added the priority/P2 label Feb 8, 2017

pweil- assigned ramr Feb 8, 2017

louyihua force-pushed the router-probe-fix branch from 12d1af2 to 2ca0282 Compare February 16, 2017 01:11

louyihua changed the title ~~[DO NOT MERGE] A new liveness probe for router pod~~ A new liveness probe for router pod Feb 16, 2017

louyihua force-pushed the router-probe-fix branch 3 times, most recently from 734ccfb to be23bb9 Compare February 16, 2017 01:46

louyihua force-pushed the router-probe-fix branch 4 times, most recently from 21a766a to 4f49dce Compare February 16, 2017 07:34

pecameron reviewed Feb 20, 2017

View reviewed changes

louyihua force-pushed the router-probe-fix branch from 4f49dce to 21e5de2 Compare February 21, 2017 02:19

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2017

louyihua mentioned this pull request Feb 27, 2017

Fix of BUG 1405440 #13121

Merged

louyihua mentioned this pull request Mar 7, 2017

Fix of BUG 1405440 (release-1.5 cherry-pick) #13273

Merged

ahardin-rh mentioned this pull request Apr 13, 2017

Doc update for origin PR #12846 openshift/openshift-docs#3786

Closed

openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 20, 2017

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 25, 2017

knobunc closed this May 31, 2017

louyihua deleted the router-probe-fix branch June 19, 2017 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new liveness probe for router pod #12846

A new liveness probe for router pod #12846

louyihua commented Feb 7, 2017 •

edited

Loading

louyihua commented Feb 8, 2017 •

edited

Loading

smarterclayton commented Feb 10, 2017

smarterclayton commented Feb 10, 2017

louyihua commented Feb 10, 2017 •

edited

Loading

louyihua commented Feb 16, 2017 •

edited

Loading

jeremyeder commented Feb 16, 2017

louyihua commented Feb 17, 2017

smarterclayton commented Feb 17, 2017 via email

louyihua commented Feb 18, 2017

jeremyeder commented Feb 20, 2017 •

edited

Loading

pecameron left a comment

pecameron Feb 20, 2017

louyihua Feb 21, 2017

pecameron Feb 20, 2017

louyihua Feb 21, 2017

pecameron Feb 20, 2017

louyihua Feb 21, 2017

pecameron Feb 20, 2017

eparis commented Feb 21, 2017

rajatchopra commented Feb 21, 2017

pecameron commented Feb 21, 2017

louyihua commented Feb 21, 2017

ramr commented Feb 22, 2017

louyihua commented Feb 22, 2017

openshift-bot commented May 25, 2017

knobunc commented May 31, 2017

A new liveness probe for router pod #12846

A new liveness probe for router pod #12846

Conversation

louyihua commented Feb 7, 2017 • edited Loading

louyihua commented Feb 8, 2017 • edited Loading

smarterclayton commented Feb 10, 2017

smarterclayton commented Feb 10, 2017

louyihua commented Feb 10, 2017 • edited Loading

louyihua commented Feb 16, 2017 • edited Loading

jeremyeder commented Feb 16, 2017

louyihua commented Feb 17, 2017

smarterclayton commented Feb 17, 2017 via email

louyihua commented Feb 18, 2017

jeremyeder commented Feb 20, 2017 • edited Loading

pecameron left a comment

Choose a reason for hiding this comment

pecameron Feb 20, 2017

Choose a reason for hiding this comment

louyihua Feb 21, 2017

Choose a reason for hiding this comment

pecameron Feb 20, 2017

Choose a reason for hiding this comment

louyihua Feb 21, 2017

Choose a reason for hiding this comment

pecameron Feb 20, 2017

Choose a reason for hiding this comment

louyihua Feb 21, 2017

Choose a reason for hiding this comment

pecameron Feb 20, 2017

Choose a reason for hiding this comment

eparis commented Feb 21, 2017

rajatchopra commented Feb 21, 2017

pecameron commented Feb 21, 2017

louyihua commented Feb 21, 2017

ramr commented Feb 22, 2017

louyihua commented Feb 22, 2017

openshift-bot commented May 25, 2017

knobunc commented May 31, 2017

louyihua commented Feb 7, 2017 •

edited

Loading

louyihua commented Feb 8, 2017 •

edited

Loading

louyihua commented Feb 10, 2017 •

edited

Loading

louyihua commented Feb 16, 2017 •

edited

Loading

jeremyeder commented Feb 20, 2017 •

edited

Loading