-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport HPA replica count calculation fixes #18216
Backport HPA replica count calculation fixes #18216
Conversation
Fix #53670 Fix a bug where `desiredReplicas` could be greater than `maxReplicas` if the original value for `desiredReplicas > scaleUpLimit` and `scaleUpLimit > maxReplicas`. Previously, when that happened, we would scale up to `scaleUpLimit`, and then in the next auto-scaling run, scale down to `maxReplicas`. Address this issue and introduce a regression test.
2e90d35
to
f68229b
Compare
Resubmitted, with testing using resource-consumer ensuring that the replica count did not exceed the limit. |
There have been a couple of recent bugs in the "normalizing" part of the `reconcileAutoscaler` method. This part of the code base is responsible for, among other things, taking the suggested desired replicas based on the metrics, ensuring it conforms to certain conditions, and updating it if it does not. Isolate the part that converts the desired replicas based on a given set of rules into its own function. We are refactoring this part of the code base to make the logic simpler and to make it easier to write unit tests.
f68229b
to
58f6c8b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DirectXMan12, RobertKrawitz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
From test log of ci/openshift-jenkins/gcp (https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18216/test_pull_request_origin_extended_conformance_gce/15016/): • [SLOW TEST:28.129 seconds]
|
Failure in https://openshift-gce-devel.appspot.com/builds/origin-ci-test/pr-logs/directory/test_pull_request_origin_extended_conformance_install appears to be reliable since 10:30 ET 2017-01-25 and frequent before then: https://openshift-gce-devel.appspot.com/builds/origin-ci-test/pr-logs/directory/test_pull_request_origin_extended_conformance_install So can't usefully retest before that is fixed. |
Case /origin-ci-test/pr-logs/directory/test_pull_request_origin_extended_conformance_crio has been failing consistently for others for at least a day, and build 3177 (following my build 3175) failed in the same way mine did: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18286/test_pull_request_origin_extended_conformance_crio/3177/ |
Case /origin-ci-test/pr-logs/directory/test_pull_request_origin_extended_conformance_gce has been failing sporadically; cases 15012 and 15019 failed in the same way mine (15016): https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18224/test_pull_request_origin_extended_conformance_gce/15012/ and https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18273/test_pull_request_origin_extended_conformance_gce/15019/ |
Case /origin-ci-test/pr-logs/directory/test_pull_request_origin_extended_conformance_install has been failing reliably since 10:30 ET 2017-01-25 with the same error I got, e. g. 6332 vs. my 6333: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18279/test_pull_request_origin_extended_conformance_install/6332/ |
Case /origin-ci-test/pr-logs/pull/18216/test_pull_request_origin_extended_conformance_install_update has been failing consistently; build 10375 got the same error as my 10376: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18199/test_pull_request_origin_extended_conformance_install_update/10375/ |
This appears to account for all failures reported on my job. |
Blocking test_pull_request_origin_extended_conformance_install #18294 |
/retest |
flake #17901 and fedora 25 (likely transient) mirror failure |
/retest |
opened new flake #18306 for GCP auth issue |
/retest |
1 similar comment
/retest |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
Automatic merge from submit-queue. |
There were multiple fixes upstream to the HPA upstream logic regarding interaction of max replica count and the scaleup limit that could result in the replica count temporarily higher than the max replica count.