-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "disable ResilientWatchCacheInitialization feature" #2192
base: master
Are you sure you want to change the base?
Revert "disable ResilientWatchCacheInitialization feature" #2192
Conversation
…eature" This reverts commit 4772890.
@benluddy: the contents of this pull request could not be automatically validated. The following commits could not be validated and must be approved by a top-level approver:
Comment |
/hold |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: benluddy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@benluddy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
When this feature is enabled, watch requests that are to be served from the watch cache immediately return 429 if the cache is not initialized and the client retries. When disabled, the same watch requests "hang" until they either time out or complete successfully.
There is an OCP test that counts the number of watch requests during a job on a per-user basis by scraping audit logs. The test fails if a user exceeds an arbitrary threshold that has been selected based on historical observations. With this feature enabled, any issue that delays watch cache initialization or forces a watch cache to reinitialize now results in an increase in the number of watch requests appearing in the audit logs (due to the retries), which in turn causes the test thresholds to breach.
This was temporarily disabled for kube-apiserver to improve the CI signal-to-noise ratio during the 1.31 rebase. It was not disabled for openshift-apiserver.
Sample job from the 1.31 rebase process before the feature was disabled: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-kube-apiserver-operator-1734-openshift-kubernetes-2055-openshift-cluster-kube-apiserver-operator-1734-nightly-4.18-e2e-aws-ovn-single-node-serial/1835775665903767552