-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fail] [k8s.io] Pods [It] should support remote command execution over websockets [Suite:openshift/conformance/parallel] [Suite:k8s] #18726
Comments
@openshift/sig-kube-origin |
@sjenning fyi |
subject looks similar to #18721 but not sure failure is the same. I saw this too in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18709/test_pull_request_origin_extended_conformance_install/7879/ |
Is this flaking or just straight out failing? |
No successful runs in the last 18 hours. Bumping to p0. |
The commit suggested by https://deck-ci.svc.ci.openshift.org/?job=test_branch_origin_extended_conformance_install doesn't make any sense. It must be a side-effect of something else. @Kargakis @stevekuznetsov @derekwaynecarr there was a docker something in our ami, right? |
This has been broken by the upgrade to docker 1.13 (very very likely). First build with 1.13 broken: Previous build with 1.12 success: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/18625/test_pull_request_origin_extended_conformance_install/7842/ |
removed all "flake" indicators from this issue |
@sjenning @deads2k @gabemontero could you guys assist us with narrowing down the issue to a docker issue? Not sure what in origin triggers that error. |
Reverted the 1.13 AMI |
It's failing here: https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/test/e2e/common/pods.go#L526 after an exec call. The websocket appears to be passed through https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubelet/server/remotecommand/websocket.go#L73 it's not obvious to me why docker levels matter, but perhaps @sjenning knows. |
I'm seeing this in the node log
Seems that we are getting a 302 redirection from the endpoint. |
I see that 302 on both versions of docker in my local cluster so that isn't it. I have near zero knowledge about websockets, spdy, http/2, etc so this could take me a while to figure out. Anyone that knows more is welcome to figure it out faster. @smarterclayton ? However, it seems that the test is reading the stream for STDERR with stream id 2 rather than STDOUT with stream 1 id as the test expects. The query params indicate the desire for both streams. I'm not sure how the test ensures that the STDOUT stream is the one that it is reading. |
I do wonder if we just removed this line if it would resolve the situation since the test does not expect to read from STDERR |
The 302 is a redirect followed by the API server before returning anything to the client, and has to do with redirecting to what will eventually be a CRI endpoint on the node. For now, it is still served by the kubelet. |
We are requesting both stdout and stderr output from running the command be sent to us, but we are not expecting to actually receive any stderr output from that command, which is why the test fails if it receives a message on the stderr channel. We should not change the test to ignore error output. I do wonder if docker changed to send zero byte writes to the standard error output stream. I could potentially see updating the test to ignore empty stderr channel messages. |
Sorry I didn't respond, I think ignoring empty stderr channel messages is totally fine. We use them in other contexts and if we have websockets support ever we will deliver zero byte messages. |
is there any way to read what's in STDERR actually? @liggitt @sjenning |
the content of the message is logged, and is empty:
that means the message contained exactly one byte ( |
Opened PR kubernetes/kubernetes#60457 upstream |
Automatic merge from submit-queue (batch tested with PRs 60457, 60331, 54970, 58731, 60562). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. tests: e2e: empty msg from channel other than stdout should be non-fatal Currently, if the exec websocket encounters a message that is not in the stdout stream, it immediately fails. However it also currently requests the stderr steam in the query params. There doesn't seem to be any guarantee that we don't get an empty message on the stderr stream. Requesting the stderr stream in the query is desirable if, for some reason, something in the container fails and writes to stderr. However, we do not need fail the test if we get an empty message on another stream. If the message is not empty, then that _does_ indicate and error and we should fail. This is the situation we are currently observing with docker 1.13 in the origin CI openshift/origin#18726 @derekwaynecarr @smarterclayton @gabemontero @liggitt @deads2k /sig node
See https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_install/7855/consoleFull#-169259343956c60d7be4b02b88ae8c268b
The failed test highlighted that
Feb 22 20:21:07.165: Got message from server that didn't start with channel 1 (STDOUT): [2]
The pod for the test was in fact running.
I saw no errors in the node log for the pod prior to the namespace getting tore down.
The text was updated successfully, but these errors were encountered: