-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statefulset creates and deletes pod repeatedly, race condition or other error? #17435
Comments
I was able to create it and it worked fine for a while, but after updating the stateful set it began doing this. It might be an error with controller history? No other obvious errors in logs at v=2. Scenario:
Actual: infinite loop of create and delete. Could be related to image stream tag resolution? Needs master team to look at (nothing explicit about image stream resolution in place). |
terrifying. Deleting the revisions got me back to working state. |
Happens on restart. Create stateful set, then update. Have two revisions. Restart the master process and it goes crazy:
restart, starts growing up to 12, all the same
|
Diff between two random revision 1:
Something about hash calculation is wrong. Maybe unstable ordering of an underlying map when calculating the hash? |
I could reproduce it even by just creating the StatefulSet and restarting master. I suspect this is caused by some rouge retry and collision avoidance for hash. Looking... |
@smarterclayton this is broken upstream as well. I think newRevision is now not re-entrant because of kubernetes/kubernetes#50490 I think we will need to add expectations, although it seems to be working now, or revert the collision avoidance PR. I suspect it will not work for rollback but I need to check it next week. |
Can you make sure there is a high severity Kube issue blocking/impacting
the 1.9 release? We need to at least triage it.
Thanks
On Nov 24, 2017, at 11:09 AM, Tomáš Nožička <[email protected]> wrote:
@smarterclayton <https://github.com/smarterclayton> this is broken upstream
as well.
I think newRevision is now not re-entrant because of
https://github.com/kubernetes/kubernetes/pull/50490/files
That revealed the fact that we don't wait for rev informer when starting
StatefulSet controller (which I am fixing now).
I think we will need to add expectations, although it seems to be working
now, or revert the collision avoidance PR. I suspect it will not work for
rollback but I need to check it next week.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17435 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p6w1HYbfnByhZmkH6z4XuCU2NJqtks5s5um3gaJpZM4QoG6O>
.
|
@smarterclayton I have labeled both the issue and PR, not sure how to make it impact 1.9 release because I can't set the milestone |
I set the milestone and added approval for the 1.9 milestone |
…mersync Automatic merge from submit-queue. UPSTREAM: 56356: Wait for controllerrevision informer to sync on statefulset controller startup /cc @smarterclayton @mfojtik fixes #17435
In master:
Not sure if this is a new post-rebase bug, a known fixed in upstream, or otherwise broken. This could potentially be very serious in a 3.7->3.8->3.9 fast rolling update.
@openshift/sig-master
The text was updated successfully, but these errors were encountered: