Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sig-storage] HostPath should support existing directory subPath flake #18823

Closed
0xmichalis opened this issue Mar 5, 2018 · 12 comments · Fixed by #18835
Closed

[sig-storage] HostPath should support existing directory subPath flake #18823

0xmichalis opened this issue Mar 5, 2018 · 12 comments · Fixed by #18835
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0 sig/storage

Comments

@0xmichalis
Copy link
Contributor

/tmp/openshift/build-rpms/rpm/BUILD/origin-3.10.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/common/host_path.go:121
Expected error:
    <*errors.errorString | 0xc420c86440>: {
        s: "expected pod \"pod-host-path-test\" success: pod \"pod-host-path-test\" failed with status: {Phase:Failed Conditions:[] Message:Pod Predicate MatchNodeSelector failed Reason:MatchNodeSelector HostIP: PodIP: StartTime:2018-03-04 07:18:54 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[] QOSClass:}",
    }
    expected pod "pod-host-path-test" success: pod "pod-host-path-test" failed with status: {Phase:Failed Conditions:[] Message:Pod Predicate MatchNodeSelector failed Reason:MatchNodeSelector HostIP: PodIP: StartTime:2018-03-04 07:18:54 +0000 UTC InitContainerStatuses:[] ContainerStatuses:[] QOSClass:}
not to have occurred
/tmp/openshift/build-rpms/rpm/BUILD/origin-3.10.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/util.go:2197

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/batch/test_pull_request_origin_extended_conformance_gce/17025/
@openshift/sig-storage
/kind test-flake

@openshift-ci-robot openshift-ci-robot added sig/storage kind/test-flake Categorizes issue or PR as related to test flakes. labels Mar 5, 2018
@deads2k
Copy link
Contributor

deads2k commented Mar 5, 2018

This isn't flaking. It is 100% failures

@deads2k
Copy link
Contributor

deads2k commented Mar 5, 2018

@deads2k
Copy link
Contributor

deads2k commented Mar 5, 2018

https://deck-ci.svc.ci.openshift.org/?job=test_branch_origin_extended_conformance_gce indicates that 3.9 is also affected. This must be related to infrastructure or job definition. The 3.9 commit has nothing to do with this area.

@0xmichalis
Copy link
Contributor Author

@smarterclayton

@derekwaynecarr
Copy link
Member

the failure does not appear contianer runtime related... the pod does not appear to ever schedule due to failing match node selectors.

Mar  4 07:18:56.715: INFO: Pod "pod-host-path-test": Phase="Failed", Reason="MatchNodeSelector", readiness=false. Elapsed: 2.18721399s

@aveshagarwal -- can you dig deeper on this?

@aveshagarwal
Copy link
Contributor

@derekwaynecarr sure looking.

@aveshagarwal
Copy link
Contributor

The node where this pod is being scheduled has infra label:

map[string]string{beta.kubernetes.io/arch: amd64,beta.kubernetes.io/instance-type: n1-standard-2,beta.kubernetes.io/os: linux,failure-domain.beta.kubernetes.io/region: us-east1,failure-domain.beta.kubernetes.io/zone: us-east1-c,kubernetes.io/hostname: ci-prtest-5a37c28-17025-ig-m-hdt0,node-role.kubernetes.io/infra: true,node-role.kubernetes.io/master: true,role: infra,

Is it expected this test pod to get scheduled on the infra node?

Or the test is incorrectly selecting infra node?

Still checking what is going on if the above is really the issue.

@derekwaynecarr
Copy link
Member

working on an upstream pr to fix the test to let the scheduler schedule the pod rather than self scheduling. this would let it work fine with any admission controllers running.

@derekwaynecarr
Copy link
Member

hmm, of course, this change is not as trivial given the test structure.

the test self-schedules pods, but prior to the pod actually being scheduled or run, it creates a directory on the machine it intends to run the pod against to validate that the pod sees that subdir. as a result, changing the test is non-obvious...

@derekwaynecarr
Copy link
Member

simpler option for now is to disable namespace node selecting for upstream tests.

see: https://github.com/openshift/origin/blob/master/test/extended/util/test.go#L231-L234

given how frequently this clause is used that would probably work best until we can change upstream.

@aveshagarwal
Copy link
Contributor

Just for reference, #18816 should address this.
@derekwaynecarr

@jpeeler
Copy link

jpeeler commented Mar 6, 2018

This has been fixed by #18842 and #18843.

@jpeeler jpeeler closed this as completed Mar 6, 2018
openshift-merge-robot added a commit that referenced this issue Mar 7, 2018
Automatic merge from submit-queue.

Allow all node scheduling for more tests

fixes #18823

need to take a pass at all tests that are self-scheduling to see if we need to open this list more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0 sig/storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants