Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Sandbox status error test #18424

Closed
wants to merge 4 commits into from

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Feb 2, 2018

Just an attempt to reproduce #18414 on GCE since I cannot locally.

dcbw added 3 commits February 2, 2018 11:34
The CNI directory arguments are already the defaults in cni.go and
we shouldn't need to change them (unless upstream changes them,
at which point we should probably match).

The Hairpin mode has been ignored upstream for CNI plugins since the
switch to CRI.  It's up to CNI network plugins to do what they need
to do WRT allowing pods to talk back to themselves via the service
IP. (note that hairpin is still used for --network-plugin=kubenet
and for the old rkt runtime).
Instead of defining things in a couple places, just use the Kubelet
CNI driver conf dir define.  If that's overridden via the kubelet
--cni-conf-dir argument, use that instead.

Also pass a known IPAM data dir to the host-local plugin, and use
that in the metrics code, to protect against upstream changes to
the defaults.
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dcbw
We suggest the following additional approver: smarterclayton

Assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 2, 2018
@openshift-merge-robot openshift-merge-robot added the vendor-update Touching vendor dir or related files label Feb 2, 2018
@dcbw
Copy link
Contributor Author

dcbw commented Feb 2, 2018

/test gcp

@dcbw
Copy link
Contributor Author

dcbw commented Feb 2, 2018

/test extended_conformance_gce

@dcbw
Copy link
Contributor Author

dcbw commented Feb 3, 2018

So this is Kubelet being stupid. It will run StopPodSandbox() multiple times in parallel for the same sandbox. And dockershim's StopPodSanbox() itself calls GetPodSandboxStatus() which checks the IP and that's where the error is coming from. That's because there's a race where the first StopPodSandbox is in TearDownPod() (and thus the networkReady suppression code isn't triggered yet) and the second StopPodSandbox() then asks for the pod IP. Before that can run nsenter, TearDownPod() completes and the first StopPodSandbox() stops the sandbox container and destroys the netns. Then the second StopPodSanbox()'s GetPodSandboxStatus() finally gets around to running nsenter and fails.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 3, 2018

@dcbw: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/cmd 1df9a0c link /test cmd
ci/openshift-jenkins/unit 1df9a0c link /test unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dcbw dcbw closed this Feb 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants