Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for HAProxy reloads #6744

Merged
merged 1 commit into from
Dec 13, 2017
Merged

Optimizations for HAProxy reloads #6744

merged 1 commit into from
Dec 13, 2017

Conversation

jmencak
Copy link
Contributor

@jmencak jmencak commented Dec 13, 2017

@jeremyeder could you please take a look?

/cc @ahardin-rh

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 13, 2017
xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`,
`ROUTER_DEFAULT_SERVER_TIMEOUT` and `RELOAD_INTERVAL` in particular. It is currently recommended to
set `RELOAD_INTERVAL` to 15s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jmencak, thanks for writing this.

Just for clarification in my own mind:

  • What is the default reload interval?
  • When you say "it is currently recommended", could you please be more specific about what to look for when considering this change? What is the "fingerprint" to look for on the system (slow route propagation?)
  • You may want to put a specific # of routes. For example, I think we saw it as low as 3000 but some other environments it might have been a bit higher?

Copy link
Contributor Author

@jmencak jmencak Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Default reload interval is 5s.
  2. 15s comes from what Ben recommended in https://bugzilla.redhat.com/show_bug.cgi?id=1471899 , but that was to address an immediate BZ issue, which should now be fixed in other ways. Perhaps I should re-formulate this or remove this completely as this is trying to fix another problem. 15s could make the number of HAProxy processes 3x lower, in theory.
  3. Not sure about this. The docs are for 3.8+ and the the ~3000 route issue I was seeing should be fixed by Change the router reload suppression so that it doesn't block updates origin#17049, which already merged.

Copy link
Contributor

@jeremyeder jeremyeder Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we need to re-test after #17049 and then re-approach this particular documentation. Unless you wanted to target this at "3.7 and earlier" versions of the docs only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has nothing to do with #17049 or BZ1471899 other than the fact increasing RELOAD_INTERVAL can alleviate the problems seen by BZ1471899 for versions 3.6 and earlier. I didn't write the section for that purpose. This section has everything to do with the inherent incapability of HAProxy to reload configuration without forking another process while serving (old and new) connections. BZ1471899 and #17049 was retested by QA, but I can retest, not a big deal.

@ahardin-rh ahardin-rh self-assigned this Dec 13, 2017
@@ -125,3 +125,18 @@ accepted by load balancers provided by many public cloud providers. However,
this affects the total memory use, especially when large numbers of connections
are open. With very large numbers of open connections, the memory usage will be
nearly proportionate to the increase of this tunable parameter.

[[optimizations-for-haproxy-reloads]]
==== Optimizations for HAProxy reloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reloads


Long-lasting connections, such as WebSocket connections, combined with
long client/server HAProxy timeouts and short HAProxy
reload intervals can cause instantiation of many HAProxy processes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intervals,

Long-lasting connections, such as WebSocket connections, combined with
long client/server HAProxy timeouts and short HAProxy
reload intervals can cause instantiation of many HAProxy processes.
These processes need to handle old connections which were started
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processes must handle old connections, which

long client/server HAProxy timeouts and short HAProxy
reload intervals can cause instantiation of many HAProxy processes.
These processes need to handle old connections which were started
before the HAProxy configuration reload. Large number of these processes is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space before large

A large number of...

These processes need to handle old connections which were started
before the HAProxy configuration reload. Large number of these processes is
undesirable, as it will exert unnecessary load on the system and may
lead to issues, such as OoM conditions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/OoM/out of memory

reload intervals can cause instantiation of many HAProxy processes.
These processes need to handle old connections which were started
before the HAProxy configuration reload. Large number of these processes is
undesirable, as it will exert unnecessary load on the system and may
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/may/can


xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`,
`ROUTER_DEFAULT_SERVER_TIMEOUT` and `RELOAD_INTERVAL` in particular.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROUTER_DEFAULT_SERVER_TIMEOUT, and RELOAD_INTERVAL.

lead to issues, such as OoM conditions.

xref:../architecture/topics/router_environment_variables.adoc#[Router environment variables] affecting this
behaviour are `ROUTER_DEFAULT_TUNNEL_TIMEOUT`, `ROUTER_DEFAULT_CLIENT_TIMEOUT`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/behaviour/behavior

@ahardin-rh
Copy link
Contributor

@jmencak Just some nits from me. Thanks for this! Also, can you please confirm that this is targeting 3.9? Is there an associated Trello card? Thanks again!

@jmencak
Copy link
Contributor Author

jmencak commented Dec 13, 2017

Yes, targetting 3.9. No trello card, simple change.

@ahardin-rh
Copy link
Contributor

Latest changes LGTM. Thanks!

@ahardin-rh ahardin-rh merged commit e406a59 into openshift:master Dec 13, 2017
ahardin-rh pushed a commit to ahardin-rh/openshift-docs that referenced this pull request Dec 13, 2017
@ahardin-rh
Copy link
Contributor

[rev_history]
|xref:../scaling_performance/routing_optimization.adoc#scaling-performance-routing-optimization[Routing Optimization]
|Added a new xref:../scaling_performance/routing_optimization.adoc#optimizations-for-haproxy-reloads[Optimizations for HAProxy Reloads] section.
%

@jmencak jmencak deleted the haproxy-reloads branch January 24, 2018 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-3.9 size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants