-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make login, project, and discovery work against kube with RBAC enabled #11340
Conversation
@fabianofranz @juanvallejo ping. |
@fabianofranz LGTM |
LGTM, @deads2k merge at will. |
[merge] |
[Test]ing while waiting on the merge queue |
yum re[test] re[merge] |
re[test] |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10125/ failed on #11315 |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10197/ failed on #11315 re[test] re[merge] |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10265/ on #11315 re[test] re[merge] |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10321/ on #11315 re[test] re[merge] |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10413/ on #11315 re[test] re[merge] |
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10553/ on #11094 re[test] re[merge] |
@stevekuznetsov does networking ever work? I keep getting #11315 |
Unclear -- I don't have a good answer for you. |
@mfojtik @smarterclayton do we we want to force this one into 1.4? |
[merge] |
[merge] unless DinD is horribly broken. On Tue, Oct 25, 2016 at 5:53 PM, OpenShift Bot [email protected]
|
@openshift/networking I'm going to guess that somehow this broke the networking test. Can you help me figure out how? |
tagging @marun since this is dind-related If I check out that branch and do Doing
??? |
@@ -36,7 +36,7 @@ func negotiateVersion(client *kclient.Client, config *restclient.Config, request | |||
// Get server versions | |||
serverGVs, err := serverAPIVersions(client, "/oapi") | |||
if err != nil { | |||
if errors.IsNotFound(err) { | |||
if errors.IsNotFound(err) || errors.IsForbidden(err) { | |||
glog.V(4).Infof("Server path /oapi was not found, returning the requested group version %v", preferredGV) | |||
return preferredGV, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that breaks the tests... Is there some other reason we could be getting a 403 here in some circumstances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that breaks the tests... Is there some other reason we could be getting a 403 here in some circumstances?
Are you hitting an openshift server or a kubernetes server? We allow all users (authenticated and unauthenticated) to hit our discovery endpoints. The only way I can think of to fail is to race with an initial cache priming, but that's a little crazy. You could wait for a zero exit code oc get --raw /oapi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it might not be that; the failure doesn't seem to be 100% reliable, so it might just be luck that it passed without that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is against an openshift server
Why is a race with initial cache priming crazy? On Mon, Oct 31, 2016 at 10:52 AM, David Eads [email protected]
|
Has this patch been tested against a non-dind multinode cluster? |
After |
Assuming stale cache in |
this confuses me... the cache shouldn't be storing anything in error cases, right? |
riddle me this. How does this pull produce this? |
It almost looks like networking is starting an openshift server that does not run the kubernetes API server, but instead tries to proxy it and somehow the proxy isn't proxying the discovery doc. That's insane though. Running an openshift API server like that won't work and I don't think this code will react differently in that case than the old code would have. |
dind runs openshift in several different modes while setting up the cluster. eg, first it runs |
Thing is, none of these are server-side changes, so the discovery information being served is identical and whatever command eventually saved the empty discovery doc, got that back from the server it queried. But, every other component that starts a master is serving "normal" discovery information from the discovery endpoints. |
I meant, they run before the master does, maybe they're binding to port 8443 and serving bogus data. But it looks like they don't. Is there any chance the master itself could be returning bad answers briefly at startup? Like, does the startup code do something like:
|
Even so, this doesn't change that behavior one way or the other. Are you guys running an |
@danwinship there is one handler chain for the server that serves everything, described in |
@danwinship That's a good theory. Can you link me to where the |
|
e7a7bac
to
fe545df
Compare
I don't see how the master could be returning bad answers briefly at startup, because if I am deploying a dind cluster after having removed |
I don't think this issue is specific to dind deployment, and that merging should wait until the networking job is passing. |
I haven't proposed forcing it. I'm not familiar with the job though. It would be nice to have a smaller, faster reproducer to help track this down. Is there a flag to prevent tear down and then scripts to debug using various kubeconfigs? The differences in this job make it harder to jump in and debug. |
@deads2k My comment about merge was in response to the bot trying to merge after your push. I realize now that the consistent job failure means it can't succeed. |
fe545df
to
4d333e7
Compare
It doesn't reproduce it for me. That command always seems to work. |
@marun can you reproduce this reliably on an AWS instance I could use to try to diagnose it. The pastebin you made doesn't seem to appear in the jenkins test run (near as I can tell) and every other master we test it against works. This really looks like its specific to networking test provisioning somehow, I can't seem to make it fail locally, and I can't figure out where the failure happens and is logged for instrumenting while running jenkins. |
bb52e9c
to
5d1d22b
Compare
@@ -221,7 +221,7 @@ function wait-for-cluster() { | |||
oc="$(os::build::find-binary oc)" | |||
|
|||
# wait for healthz to report ok before trying to get nodes | |||
os::util::wait-for-condition "ok" "${oc} get --config=\"${kubeconfig}\" --raw=/healthz" "120" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function in question claims it uses eval
but in fact does this:
while ! $(${condition}); do
This means that yes, these quotes were incorrect and the actual value of --config
that is passed to oc get
is "${kubeconfig}"
, literal quotes and all. @marun this type of thing is why I feel so strongly about not having the provision_util.sh
file, since we spent a lot of time and effort getting it right the first time in os::cmd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provision_util.sh
isn't used here. That's legacy and only used by vagrant. I'm assuming you meant images/dind/node/openshift-dind-lib.sh
, which needs to be a separate file so it can be distributed in the dind image. I'm happy to have os::cmd
take over responsibility for this use case, but it would have to be copied into the image file for distribution regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, whatever checkout of Origin I'm in right now has the os::util::wait-for-condition
function in provision-util.sh
.
@@ -249,7 +249,7 @@ EOF | |||
# Remove formatting before use | |||
template="$(echo "${template}" | tr -d '\n' | sed -e 's/} \+/}/g')" | |||
local count | |||
count="$("${oc}" --config="${kubeconfig}" get nodes \ | |||
count="$("${oc}" --config=${kubeconfig} get nodes \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't get this one to fail for me, and these quotes look fine. Revert this change, please.
@@ -220,6 +220,9 @@ function wait-for-cluster() { | |||
local oc | |||
oc="$(os::build::find-binary oc)" | |||
|
|||
# wait for healthz to report ok before trying to get nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this change would be necessary. Consider removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this change would be necessary. Consider removing.
I think its reasonable to wait until the API server is ready before making real requests to it. This mirrors what other e2e tests do.
and it worked. I'll squash down and eliminate the "extra" unquoting. |
5d1d22b
to
6c6ec1a
Compare
Evaluated for origin test up to 6c6ec1a |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11643/) (Base Commit: fdd204a) |
@deads2k congrats, tests passed |
Evaluated for origin merge up to 6c6ec1a |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11653/) (Base Commit: 00ec8d2) (Image: devenv-rhel7_5407) |
When RBAC is enabled (see kubernetes/kubernetes#34619), we have to tolerate 403s in addition to 404s.
@pweil- you'll probably need this if you want the tooling to work nicely (and it makes a difference)