-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix of BUG 1405440 (release-1.5 cherry-pick) #13273
Fix of BUG 1405440 (release-1.5 cherry-pick) #13273
Conversation
Using TCPSocketAction as the liveness probe, which will not be affected by the connection limit set in HAProxy's config file. This is a TRUE fix for BUG 1405440.
@pweil- Hmm. This merged while I was away and I may have disputed the original merge. We fixed the original problem by upping the number of connections that were allowed (so I think the high sev part is addressed). This PR changes the semantic of the health check to just see if the connection is accepted rather than exercising anything more in haproxy. So, if haproxy is wedged elsewhere, you can't know. @eparis, any thoughts? |
@knobunc Some discussion about the health check was in #12846. And upping the number of maxconn just REDUCE/DELAY the BUG, not FIX it. For short, currently used health check actually cannot provide any more in haproxy than this new check. But the old check would fail the pod when Considering that reaching the connection limit DOES NOT mean the pod fails, while it DOES mean that the pod is NOT READY for new incoming connections, keeping the old check as readiness check while change the liveness check should be a good way to really SOLVE the bug. Maybe in future we can use better health check as liveness probe to both keep success at high load and provide more information in haproxy. But for now, the TCP connection based check should be a simple but effective way to solve the bug. |
See also @eparis' concerns at https://bugzilla.redhat.com/show_bug.cgi?id=1405440#c6 What is the urgency for this particular backport? The change to maxconns is in, and there was some question on the original PR this is cherry-picking about how the new test would perform under load. Have we performed that testing? |
For @eparis 's concern, I've checked the haproxy's code, and the 'not responding' is not true. In fact, when connection limit reaches, haproxy actively disconnect new incoming connections just after it accepts the new socket, so the HTTP health check definitly fails as no HTTP response is generated, but the TCP check should succeed as the socket connection is really accepted. When cocurrent connections drops below the So, for the testing problem, what we just need to do is to set And, if there is really too much connections that exceeds the capability of haproxy, the TCP check will also fail, but this is the right semantic as only in such situation we should think haproxy is unusable. But this is the case we should prevent, and this is why I disputed the commit that ups the connection limit. Think that, if we just up the limit to prevent liveness probe to fail rather than limit the cocurrent connections to a single router pod, the final destiny will definitely falls into the case that there are too much connections for haproxy to handle. However, if we keep |
@louyihua Ok, thanks! In that case my concerns are assuaged. [merge] |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/909/) (Base Commit: 6fa29de) (Image: devenv-rhel7_6052) |
Evaluated for origin merge up to 612dc51 |
[Test]ing while waiting on the merge queue |
Evaluated for origin test up to 612dc51 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/909/) (Base Commit: 6fa29de) |
Using TCPSocketAction as the liveness probe, which will not be affected by the connection limit set in HAProxy's config file. This is a TRUE fix for BUG 1405440.
@pweil- @rajatchopra
Cherry-picked from the PR #13121.