-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927
Conversation
/retest |
@jsafrane backport to stage? |
The resync for other shared informer consumers would be skipped until their requested resync came about, right? Did that feature break somehow? |
31e6220
to
5a6b0b0
Compare
rebased to stage @deads2k, it's fix for this bug: kubernetes/kubernetes#49905 (comment). PV controller may start when informer sync period is already fixed and can't be changed. |
@jsafrane: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
That sounds like a bug. Who is starting it? We fixed the GC start problem. |
It's started as usual controller in controller-manager, nothing special about it. And someone complained (@LiGgit?) that we should not force all controllers to have 15 second resync period, that's too often. So I added manual resync. |
This PR does not touch /test all |
/test cross |
Found the complaint: kubernetes/kubernetes#48941 |
@liggitt was that before we ensured startup order and had the opt-in resync? |
probably
not sure what that is referring to. I still think a 15 second resync, no matter how it is done, is way too short. we have better patterns for retrying failed objects at a shorter interval without resyncing the whole list. |
No other consumers would see a resync. Only this controller that asked would see it. I agree it is too short, but this doesn't look better than actually specifying the resync since it doesn't hurt other consumers. |
would like to point out it's not just for retrying failed cases, it's a fundamental assumption in the pv controller which is apparently "space shuttle" code and hard to change? The reason for this bug is basically there is a "syncUnboundClaim" but no "syncUnboundVolume": if a volume has just been created, there is no reason to assume there exists a claim looking for it. but if a claim has just been created, obviously the controller should find a volume for it. another case from today: a |
requiring resync of all objects to stay responsive on subsets of objects that need reprocessing is an assumption that does not scale and should be redesigned. |
While I agree with this, nobody complained so far that PV controller is too slow. Compare with A/D controller that syncs all pods with attachable volumes every 100 ms. Redesign is on long-term TODO list, I added a card to our board. https://trello.com/c/ARmicYxn/577-speed-up-pv-controller |
At any rate, there are now better ways to solve this and you should use them instead of this. However, this is the current state upstream and is better than the bug at the moment. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, jsafrane The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@jsafrane you you also open the pull against master so it doesn't regress. |
@jsafrane holding this until the openshift/origin:master pull is open and labeled. |
opened PR against master in #16965 |
Automatic merge from submit-queue (batch tested with PRs 16667, 16796, 16960, 16965, 16894). [master] UPSTREAM: 49016: PV controller: resync informers manually Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1501152 This is the same as #16927, just for master instead of stage. /assign @deads2k
master counterpart is merged, can we merge this one? |
going to close this one, we'll get it in stage tomorrow night on the next stage rebase. |
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1501152
cc @openshift/sig-storage