[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726

gabemontero · 2018-02-22T21:56:05Z

See https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_install/7855/consoleFull#-169259343956c60d7be4b02b88ae8c268b

The failed test highlighted that Feb 22 20:21:07.165: Got message from server that didn't start with channel 1 (STDOUT): [2]

The pod for the test was in fact running.

I saw no errors in the node log for the pod prior to the namespace getting tore down.

The text was updated successfully, but these errors were encountered:

gabemontero · 2018-02-22T21:56:58Z

@openshift/sig-kube-origin

gabemontero · 2018-02-22T21:57:14Z

@sjenning fyi

sosiouxme · 2018-02-23T11:57:12Z

subject looks similar to #18721 but not sure failure is the same.

I saw this too in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18709/test_pull_request_origin_extended_conformance_install/7879/

deads2k · 2018-02-23T12:37:21Z

Is this flaking or just straight out failing?

deads2k · 2018-02-23T12:38:10Z

No successful runs in the last 18 hours. Bumping to p0.

deads2k · 2018-02-23T12:43:58Z

The commit suggested by https://deck-ci.svc.ci.openshift.org/?job=test_branch_origin_extended_conformance_install doesn't make any sense. It must be a side-effect of something else.

@Kargakis @stevekuznetsov @derekwaynecarr there was a docker something in our ami, right?

runcom · 2018-02-23T14:15:48Z

This has been broken by the upgrade to docker 1.13 (very very likely). First build with 1.13 broken:

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_install/7843/consoleFull#51637152658b6e51eb7608a5981914356

Previous build with 1.12 success: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18625/test_pull_request_origin_extended_conformance_install/7842/

gabemontero · 2018-02-23T14:27:23Z

removed all "flake" indicators from this issue

runcom · 2018-02-23T14:29:44Z

@sjenning @deads2k @gabemontero could you guys assist us with narrowing down the issue to a docker issue? Not sure what in origin triggers that error.

sdodson · 2018-02-23T15:30:16Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_openshift-ansible/7235/test_pull_request_openshift_ansible_extended_conformance_install/5564/#k8sio-pods-should-support-remote-command-execution-over-websockets-suiteopenshiftconformanceparallel-suitek8s

stevekuznetsov · 2018-02-23T15:43:12Z

Reverted the 1.13 AMI

deads2k · 2018-02-23T15:55:11Z

It's failing here: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/common/pods.go#L526 after an exec call. The websocket appears to be passed through https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/server/remotecommand/websocket.go#L73

it's not obvious to me why docker levels matter, but perhaps @sjenning knows.

sjenning · 2018-02-23T21:06:27Z

I'm seeing this in the node log

Feb 23 14:29:07 ip-172-18-1-150.ec2.internal origin-node[26740]: I0223 14:29:07.964266   26740 server.go:800] GET /exec/e2e-tests-pods-wn695/pod-exec-websocket-e22e0362-18a5-11e8-beb9-0ea3dacd3950/main?command=cat&command=%2Fetc%2Fresolv.conf&error=1&output=1: (35.208338ms) 302 [[Go-http-client/1.1] 172.18.1.150:50886]

Seems that we are getting a 302 redirection from the endpoint.

sjenning · 2018-02-23T22:52:53Z

I see that 302 on both versions of docker in my local cluster so that isn't it.

I have near zero knowledge about websockets, spdy, http/2, etc so this could take me a while to figure out. Anyone that knows more is welcome to figure it out faster. @smarterclayton ?

However, it seems that the test is reading the stream for STDERR with stream id 2 rather than STDOUT with stream 1 id as the test expects. The query params indicate the desire for both streams. I'm not sure how the test ensures that the STDOUT stream is the one that it is reading.

sjenning · 2018-02-23T22:54:49Z

I do wonder if we just removed this line if it would resolve the situation since the test does not expect to read from STDERR
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/common/pods.go#L499

liggitt · 2018-02-23T23:00:26Z

The 302 is a redirect followed by the API server before returning anything to the client, and has to do with redirecting to what will eventually be a CRI endpoint on the node. For now, it is still served by the kubelet.

liggitt · 2018-02-23T23:04:19Z

We are requesting both stdout and stderr output from running the command be sent to us, but we are not expecting to actually receive any stderr output from that command, which is why the test fails if it receives a message on the stderr channel. We should not change the test to ignore error output. I do wonder if docker changed to send zero byte writes to the standard error output stream. I could potentially see updating the test to ignore empty stderr channel messages.

smarterclayton · 2018-02-23T23:17:26Z

Sorry I didn't respond, I think ignoring empty stderr channel messages is totally fine. We use them in other contexts and if we have websockets support ever we will deliver zero byte messages.

runcom · 2018-02-26T14:35:59Z

We are requesting both stdout and stderr output from running the command be sent to us, but we are not expecting to actually receive any stderr output from that command, which is why the test fails if it receives a message on the stderr channel. We should not change the test to ignore error output. I do wonder if docker changed to send zero byte writes to the standard error output stream. I could potentially see updating the test to ignore empty stderr channel messages.

is there any way to read what's in STDERR actually? @liggitt @sjenning

liggitt · 2018-02-26T15:04:34Z

is there any way to read what's in STDERR actually? @liggitt @sjenning

the content of the message is logged, and is empty:

Feb 22 20:21:07.165: Got message from server that didn't start with channel 1 (STDOUT): [2]

that means the message contained exactly one byte (2) which indicated the stderr channel, and no other content

sjenning · 2018-02-26T21:59:08Z

Opened PR kubernetes/kubernetes#60457 upstream

@derekwaynecarr

Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. tests: e2e: empty msg from channel other than stdout should be non-fatal Currently, if the exec websocket encounters a message that is not in the stdout stream, it immediately fails. However it also currently requests the stderr steam in the query params. There doesn't seem to be any guarantee that we don't get an empty message on the stderr stream. Requesting the stderr stream in the query is desirable if, for some reason, something in the container fails and writes to stderr. However, we do not need fail the test if we get an empty message on another stream. If the message is not empty, then that _does_ indicate and error and we should fail. This is the situation we are currently observing with docker 1.13 in the origin CI openshift/origin#18726 @derekwaynecarr @smarterclayton @gabemontero @liggitt @deads2k /sig node

gabemontero added priority/P1 kind/test-flake Categorizes issue or PR as related to test flakes. labels Feb 22, 2018

gabemontero mentioned this issue Feb 22, 2018

Bld metrics path to alerts part 2 #18703

Merged

openshift-ci-robot added the sig/kube-origin label Feb 22, 2018

sosiouxme mentioned this issue Feb 23, 2018

diagnostic reorg and NetworkCheck fix #18709

Merged

deads2k added priority/P0 and removed priority/P1 labels Feb 23, 2018

gabemontero removed the kind/test-flake Categorizes issue or PR as related to test flakes. label Feb 23, 2018

gabemontero added the sig/pod label Feb 23, 2018

enj mentioned this issue Feb 23, 2018

[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18721

Closed

stevekuznetsov added the dependency/docker label Feb 23, 2018

jwforres mentioned this issue Feb 23, 2018

CRI-O: CI broken #18732

Closed

jwforres assigned sjenning Feb 23, 2018

sjenning mentioned this issue Feb 26, 2018

tests: e2e: empty msg from channel other than stdout should be non-fatal kubernetes/kubernetes#60457

Merged

sjenning mentioned this issue Feb 26, 2018

UPSTREAM: 60457: tests: e2e: empty msg from channel other than stdout should be non-fatal #18755

Merged

smarterclayton closed this as completed in #18755 Feb 27, 2018

0xmichalis mentioned this issue Feb 28, 2018

Enable rhel 7.5 and docker 1.13 openshift-eng/aos-cd-jobs#1061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726

[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726

gabemontero commented Feb 22, 2018

gabemontero commented Feb 22, 2018

gabemontero commented Feb 22, 2018

sosiouxme commented Feb 23, 2018

deads2k commented Feb 23, 2018

deads2k commented Feb 23, 2018

deads2k commented Feb 23, 2018

runcom commented Feb 23, 2018

gabemontero commented Feb 23, 2018

runcom commented Feb 23, 2018

sdodson commented Feb 23, 2018

stevekuznetsov commented Feb 23, 2018

deads2k commented Feb 23, 2018

sjenning commented Feb 23, 2018

sjenning commented Feb 23, 2018 •

edited

Loading

sjenning commented Feb 23, 2018

liggitt commented Feb 23, 2018

liggitt commented Feb 23, 2018

smarterclayton commented Feb 23, 2018

runcom commented Feb 26, 2018

liggitt commented Feb 26, 2018

sjenning commented Feb 26, 2018

[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726

[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726

Comments

gabemontero commented Feb 22, 2018

gabemontero commented Feb 22, 2018

gabemontero commented Feb 22, 2018

sosiouxme commented Feb 23, 2018

deads2k commented Feb 23, 2018

deads2k commented Feb 23, 2018

deads2k commented Feb 23, 2018

runcom commented Feb 23, 2018

gabemontero commented Feb 23, 2018

runcom commented Feb 23, 2018

sdodson commented Feb 23, 2018

stevekuznetsov commented Feb 23, 2018

deads2k commented Feb 23, 2018

sjenning commented Feb 23, 2018

sjenning commented Feb 23, 2018 • edited Loading

sjenning commented Feb 23, 2018

liggitt commented Feb 23, 2018

liggitt commented Feb 23, 2018

smarterclayton commented Feb 23, 2018

runcom commented Feb 26, 2018

liggitt commented Feb 26, 2018

sjenning commented Feb 26, 2018

sjenning commented Feb 23, 2018 •

edited

Loading