Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting node schedulability is flakey #12522

Closed
sdodson opened this issue Jan 17, 2017 · 5 comments
Closed

Setting node schedulability is flakey #12522

sdodson opened this issue Jan 17, 2017 · 5 comments
Assignees
Labels
component/cli kind/bug Categorizes issue or PR as related to a bug. priority/P1

Comments

@sdodson
Copy link
Member

sdodson commented Jan 17, 2017

During the installation we set schedulability using oc adm manage-node ip-1-2-3-4.ec2.internal --schedulable=false --config=/tmp/openshift-ansible-zx84bD/admin.kubeconfig -n default. On a cluster of 8 nodes this is happening frequently enough that the majority of the installations will have at least one node that fails to set schedulability.

Version

OCP v3.4.0.39 and v3.3.1.9

Steps To Reproduce
  1. Perform an ansible based installation with several nodes, 8+
Current Result

Tasks that set node schedulability frequently have at least one node that fails with an error.

    TASK [openshift_manage_node : Set node schedulability] *************************
    changed: [1.2.3.4 -> 1.2.3.10]
    changed: [1.2.3.5 -> 1.2.3.10]
    changed: [1.2.3.6 -> 1.2.3.10]
    changed: [1.2.3.7 -> 1.2.3.10]
    changed: [1.2.3.8 -> 1.2.3.10]
    changed: [1.2.3.9 -> 1.2.3.10]
    fatal: [1.2.3.10 -> 1.2.3.10]: FAILED! => {"changed": true, "cmd": ["oc", "adm", "manage-node", "ip-1-2-3-10.ec2.internal", "--schedulable=false", "--config=/tmp/openshift-ansible-zx84bD/admin.kubeconfig", "-n", "default"], "delta": "0:00:05.617827", "end": "2017-01-17 14:29:02.233557", "failed": true, "rc": 1, "start": "2017-01-17 14:28:56.615730", "stderr": "Error from server: Operation cannot be fulfilled on nodes \"ip-1-2-3-10.ec2.internal\": the object has been modified; please apply your changes to the latest version and try again", "stdout": "", "stdout_lines": [], "warnings": []}
    changed: [1.2.3.11 -> 1.2.3.10]

Expected Result

Schedulability set reliably.

I believe this has also been observed when setting labels on nodes. see openshift/openshift-ansible#1934

@sdodson
Copy link
Member Author

sdodson commented Jan 17, 2017

CC: @deads2k @mwoodson

@deads2k
Copy link
Contributor

deads2k commented Jan 17, 2017

@fabianofranz is this just a matter of using patch instead of update to avoid a conflict?

@pweil- pweil- added component/cli kind/bug Categorizes issue or PR as related to a bug. priority/P1 labels Jan 18, 2017
@fabianofranz
Copy link
Member

is this just a matter of using patch instead of update to avoid a conflict?

Yes, that's the issue, and I fixed it recently already in #12486. @sdodson are you able to give it a try with latest master?

@sdodson
Copy link
Member Author

sdodson commented Jan 18, 2017

@fabianofranz No, I probably won't have time to do that, but the concept makes sense to me and seems to be what @deads2k believes to be the culprit. Can you verify that this isn't a problem for oc label as well and open PRs against enterprise-3.4 and enterprise-3.3?

@fabianofranz
Copy link
Member

oc label is not affected by this issue.

@sdodson backports for enterprise-3.4 and enterprise-3.3 are here:
https://github.com/openshift/ose/pull/566
https://github.com/openshift/ose/pull/567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cli kind/bug Categorizes issue or PR as related to a bug. priority/P1
Projects
None yet
Development

No branches or pull requests

4 participants