-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for HAProxy reloads #6744
Conversation
xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this | ||
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`, | ||
`ROUTER_DEFAULT_SERVER_TIMEOUT` and `RELOAD_INTERVAL` in particular. It is currently recommended to | ||
set `RELOAD_INTERVAL` to 15s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jmencak, thanks for writing this.
Just for clarification in my own mind:
- What is the default reload interval?
- When you say "it is currently recommended", could you please be more specific about what to look for when considering this change? What is the "fingerprint" to look for on the system (slow route propagation?)
- You may want to put a specific # of routes. For example, I think we saw it as low as 3000 but some other environments it might have been a bit higher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Default reload interval is 5s.
- 15s comes from what Ben recommended in https://bugzilla.redhat.com/show_bug.cgi?id=1471899 , but that was to address an immediate BZ issue, which should now be fixed in other ways. Perhaps I should re-formulate this or remove this completely as this is trying to fix another problem. 15s could make the number of HAProxy processes 3x lower, in theory.
- Not sure about this. The docs are for 3.8+ and the the ~3000 route issue I was seeing should be fixed by Change the router reload suppression so that it doesn't block updates origin#17049, which already merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we need to re-test after #17049 and then re-approach this particular documentation. Unless you wanted to target this at "3.7 and earlier" versions of the docs only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section has nothing to do with #17049 or BZ1471899 other than the fact increasing RELOAD_INTERVAL can alleviate the problems seen by BZ1471899 for versions 3.6 and earlier. I didn't write the section for that purpose. This section has everything to do with the inherent incapability of HAProxy to reload configuration without forking another process while serving (old and new) connections. BZ1471899 and #17049 was retested by QA, but I can retest, not a big deal.
@@ -125,3 +125,18 @@ accepted by load balancers provided by many public cloud providers. However, | |||
this affects the total memory use, especially when large numbers of connections | |||
are open. With very large numbers of open connections, the memory usage will be | |||
nearly proportionate to the increase of this tunable parameter. | |||
|
|||
[[optimizations-for-haproxy-reloads]] | |||
==== Optimizations for HAProxy reloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reloads
|
||
Long-lasting connections, such as WebSocket connections, combined with | ||
long client/server HAProxy timeouts and short HAProxy | ||
reload intervals can cause instantiation of many HAProxy processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intervals,
Long-lasting connections, such as WebSocket connections, combined with | ||
long client/server HAProxy timeouts and short HAProxy | ||
reload intervals can cause instantiation of many HAProxy processes. | ||
These processes need to handle old connections which were started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processes must handle old connections, which
long client/server HAProxy timeouts and short HAProxy | ||
reload intervals can cause instantiation of many HAProxy processes. | ||
These processes need to handle old connections which were started | ||
before the HAProxy configuration reload. Large number of these processes is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space before large
A large number of...
These processes need to handle old connections which were started | ||
before the HAProxy configuration reload. Large number of these processes is | ||
undesirable, as it will exert unnecessary load on the system and may | ||
lead to issues, such as OoM conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/OoM/out of memory
reload intervals can cause instantiation of many HAProxy processes. | ||
These processes need to handle old connections which were started | ||
before the HAProxy configuration reload. Large number of these processes is | ||
undesirable, as it will exert unnecessary load on the system and may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/may/can
|
||
xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this | ||
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`, | ||
`ROUTER_DEFAULT_SERVER_TIMEOUT` and `RELOAD_INTERVAL` in particular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROUTER_DEFAULT_SERVER_TIMEOUT
, and RELOAD_INTERVAL
.
lead to issues, such as OoM conditions. | ||
|
||
xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this | ||
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/behaviour/behavior
@jmencak Just some nits from me. Thanks for this! Also, can you please confirm that this is targeting 3.9? Is there an associated Trello card? Thanks again! |
Yes, targetting 3.9. No trello card, simple change. |
Latest changes LGTM. Thanks! |
(cherry picked from commit 253c88d) xref:openshift#6744
[rev_history] |
@jeremyeder could you please take a look?
/cc @ahardin-rh