Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV tests are flaky #49905

Closed
msau42 opened this issue Jul 31, 2017 · 3 comments
Closed

PV tests are flaky #49905

msau42 opened this issue Jul 31, 2017 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@msau42
Copy link
Member

msau42 commented Jul 31, 2017

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
/sig storage

What happened:
Both GCE and NFS PV tests have been flaking, with this signature:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/persistent_volumes-gce.go:103
Expected error:
    <*errors.errorString | 0xc42026f5a0>: {
        s: "PVC \"pvc-wc7m6\" did not become Bound: PersistentVolumeClaim pvc-wc7m6 not in phase Bound within 5m0s",
    }
    PVC "pvc-wc7m6" did not become Bound: PersistentVolumeClaim pvc-wc7m6 not in phase Bound within 5m0s
not to have occurred
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/storage/persistent_volumes-gce.go:45

testgrid
logs

What you expected to happen:
Tests should not be flaky

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Jul 31, 2017
@msau42
Copy link
Member Author

msau42 commented Jul 31, 2017

@jsafrane can you help take a look? It seems like the binding failed because the PV object was not up to date, but it never retried again.

I0731 18:50:59.488716       5 pv_controller_base.go:485] storeObjectUpdate: adding claim "e2e-tests-pv-p6r9p/pvc-wc7m6", version 4459
I0731 18:50:59.488793       5 pv_controller.go:218] synchronizing PersistentVolumeClaim[e2e-tests-pv-p6r9p/pvc-wc7m6]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
I0731 18:50:59.488867       5 pv_controller.go:280] synchronizing unbound PersistentVolumeClaim[e2e-tests-pv-p6r9p/pvc-wc7m6]: volume "gce-vz753" found: phase: Pending, bound to: "", boundByController: false
I0731 18:50:59.488876       5 pv_controller.go:883] binding volume "gce-vz753" to claim "e2e-tests-pv-p6r9p/pvc-wc7m6"
I0731 18:50:59.488882       5 pv_controller.go:749] updating PersistentVolume[gce-vz753]: binding to "e2e-tests-pv-p6r9p/pvc-wc7m6"
I0731 18:50:59.488908       5 pv_controller.go:792] claim "e2e-tests-pv-p6r9p/pvc-wc7m6" bound to volume "gce-vz753"
I0731 18:50:59.493979       5 pv_controller_base.go:513] storeObjectUpdate updating volume "gce-vz753" with version 4460
I0731 18:50:59.494006       5 pv_controller.go:718] volume "gce-vz753" entered phase "Available"
I0731 18:50:59.494023       5 pv_controller_base.go:513] storeObjectUpdate updating volume "gce-vz753" with version 4460
I0731 18:50:59.494049       5 pv_controller.go:433] synchronizing PersistentVolume[gce-vz753]: phase: Available, bound to: "", boundByController: false
I0731 18:50:59.494059       5 pv_controller.go:438] synchronizing PersistentVolume[gce-vz753]: volume is unused
I0731 18:50:59.494063       5 pv_controller.go:689] updating PersistentVolume[gce-vz753]: set phase Available
I0731 18:50:59.494069       5 pv_controller.go:692] updating PersistentVolume[gce-vz753]: phase Available already set
I0731 18:50:59.500900       5 pv_controller.go:795] updating PersistentVolume[gce-vz753]: binding to "e2e-tests-pv-p6r9p/pvc-wc7m6" failed: Operation cannot be fulfilled on persistentvolumes "gce-vz753": the object has been modified; please apply your changes to the latest version and try again
I0731 18:50:59.500933       5 pv_controller.go:886] error binding volume "gce-vz753" to claim "e2e-tests-pv-p6r9p/pvc-wc7m6": failed saving the volume: Operation cannot be fulfilled on persistentvolumes "gce-vz753": the object has been modified; please apply your changes to the latest version and try again
I0731 18:50:59.500946       5 pv_controller_base.go:244] could not sync claim "e2e-tests-pv-p6r9p/pvc-wc7m6": Operation cannot be fulfilled on persistentvolumes "gce-vz753": the object has been modified; please apply your changes to the latest version and try again

@jsafrane
Copy link
Member

jsafrane commented Aug 1, 2017

Indeed, the controller did not get periodic sync of all PVs/PVCs every 15 seconds as it should.

I can see when PV controller is rejected when it asks for sync every 15 seconds during startup:

I0731 18:44:37.468024       5 controllermanager.go:434] Starting "persistentvolume-binder"

...

W0731 18:44:37.718192       5 shared_informer.go:301] resyncPeriod 15000000000 is smaller than resyncCheckPeriod 356207284003 and the informer has already started. Changing it to 356207284003
W0731 18:44:37.718233       5 shared_informer.go:301] resyncPeriod 15000000000 is smaller than resyncCheckPeriod 356207284003 and the informer has already started. Changing it to 356207284003
I0731 18:44:37.718268       5 controllermanager.go:444] Started "persistentvolume-binder"

So the controller gets updates every ~6 minutes instead of 15 seconds. This must be something new, informers were not started yet when PV controller subscribed for events. Anyway, I have a PR #49016 that fixes it.

@msau42
Copy link
Member Author

msau42 commented Aug 2, 2017

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

3 participants