Setting node schedulability is flakey #12522

sdodson · 2017-01-17T15:29:31Z

During the installation we set schedulability using oc adm manage-node ip-1-2-3-4.ec2.internal --schedulable=false --config=/tmp/openshift-ansible-zx84bD/admin.kubeconfig -n default. On a cluster of 8 nodes this is happening frequently enough that the majority of the installations will have at least one node that fails to set schedulability.

Version

OCP v3.4.0.39 and v3.3.1.9

Steps To Reproduce

Perform an ansible based installation with several nodes, 8+

Current Result

Tasks that set node schedulability frequently have at least one node that fails with an error.

    TASK [openshift_manage_node : Set node schedulability] *************************
    changed: [1.2.3.4 -> 1.2.3.10]
    changed: [1.2.3.5 -> 1.2.3.10]
    changed: [1.2.3.6 -> 1.2.3.10]
    changed: [1.2.3.7 -> 1.2.3.10]
    changed: [1.2.3.8 -> 1.2.3.10]
    changed: [1.2.3.9 -> 1.2.3.10]
    fatal: [1.2.3.10 -> 1.2.3.10]: FAILED! => {"changed": true, "cmd": ["oc", "adm", "manage-node", "ip-1-2-3-10.ec2.internal", "--schedulable=false", "--config=/tmp/openshift-ansible-zx84bD/admin.kubeconfig", "-n", "default"], "delta": "0:00:05.617827", "end": "2017-01-17 14:29:02.233557", "failed": true, "rc": 1, "start": "2017-01-17 14:28:56.615730", "stderr": "Error from server: Operation cannot be fulfilled on nodes \"ip-1-2-3-10.ec2.internal\": the object has been modified; please apply your changes to the latest version and try again", "stdout": "", "stdout_lines": [], "warnings": []}
    changed: [1.2.3.11 -> 1.2.3.10]

Expected Result

Schedulability set reliably.

I believe this has also been observed when setting labels on nodes. see openshift/openshift-ansible#1934

The text was updated successfully, but these errors were encountered:

sdodson · 2017-01-17T15:29:42Z

CC: @deads2k @mwoodson

deads2k · 2017-01-17T15:49:22Z

@fabianofranz is this just a matter of using patch instead of update to avoid a conflict?

fabianofranz · 2017-01-18T16:48:39Z

is this just a matter of using patch instead of update to avoid a conflict?

Yes, that's the issue, and I fixed it recently already in #12486. @sdodson are you able to give it a try with latest master?

sdodson · 2017-01-18T17:45:36Z

@fabianofranz No, I probably won't have time to do that, but the concept makes sense to me and seems to be what @deads2k believes to be the culprit. Can you verify that this isn't a problem for oc label as well and open PRs against enterprise-3.4 and enterprise-3.3?

fabianofranz · 2017-01-23T20:58:06Z

oc label is not affected by this issue.

@sdodson backports for enterprise-3.4 and enterprise-3.3 are here:
https://github.com/openshift/ose/pull/566
https://github.com/openshift/ose/pull/567

pweil- added component/cli kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Jan 18, 2017

pweil- assigned fabianofranz Jan 18, 2017

fabianofranz closed this as completed Jan 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting node schedulability is flakey #12522

Setting node schedulability is flakey #12522

sdodson commented Jan 17, 2017

sdodson commented Jan 17, 2017

deads2k commented Jan 17, 2017

fabianofranz commented Jan 18, 2017

sdodson commented Jan 18, 2017

fabianofranz commented Jan 23, 2017

Setting node schedulability is flakey #12522

Setting node schedulability is flakey #12522

Comments

sdodson commented Jan 17, 2017

Version

Steps To Reproduce

Current Result

Expected Result

sdodson commented Jan 17, 2017

deads2k commented Jan 17, 2017

fabianofranz commented Jan 18, 2017

sdodson commented Jan 18, 2017

fabianofranz commented Jan 23, 2017