-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move pod-namespace calls out of process #18355
Move pod-namespace calls out of process #18355
Conversation
I hate that we have to do this, but the code looks as reasonable as it can be. |
It was only being used in one remaining case (to get the pod IP when tearing down a pod), but we can just always use the fallback code to get that information from OVS instead.
This requires two OVS commands rather than one but it saves us having to parse all that dump-flows output.
7269ac1
to
5d62e09
Compare
/test extended_networking |
5d62e09
to
b556452
Compare
b556452
to
dd7a754
Compare
/test extended_networking |
@dcbw I think this is ready for review. |
/test gcp |
/test gcp The DNS flake is currently undiagnosed, but doesn't appear to be network related. However, I notice when that test fails I see the following things in the logs on that node for the specific pod (dns-test):
and then a little later for a different pod:
|
/retest |
/test gcp |
I launched 4 jobs for GCP in parallel, that should give good coverage on the original flake issue. |
One of them failed on a web console startup failure - that could be network related (web console is the only component that in theory needs pod network started during the install). Rest of the runs went fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, knobunc The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
That's not a new thing and I don't think it indicates anything actually going wrong with the pod: it's just kubelet trying to get a status update on a pod that it has already started tearing down. (We ought to fix it to be better about that...) |
Automatic merge from submit-queue (batch tested with PRs 18376, 18355). |
Can you open a bug for that and assign it to Seth? I think it happens on every pod termination now. |
filed #18414 for the pod termination vs pod status error |
func (p *cniPlugin) CmdAdd(args *skel.CmdArgs) (types.Result, error) { | ||
body, err := p.doCNI("http://dummy/", newCNIRequest(args)) | ||
func (p *cniPlugin) doCNIServerAdd(req *cniserver.CNIRequest, hostVeth string) (types.Result, error) { | ||
req.Env["OSDN_HOSTVETH"] = hostVeth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to do this in the environment rather than just stuff it into the JSON by extending CNIRequest?
|
||
func (p *cniPlugin) CmdAdd(args *skel.CmdArgs) error { | ||
req := newCNIRequest(args) | ||
ifname := req.Env["CNI_IFNAME"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args.IfName and args.Netns. But you can ignore the "" check if you want as these are already checked for you in the 'skel' package.
}, | ||
} | ||
index := 0 | ||
result030.IPs[0].Interface = &index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be " = current.Int(0)" now.
pkg/network/node/ovscontroller.go
Outdated
@@ -323,64 +329,58 @@ func (oc *ovsController) SetPodBandwidth(hostVeth, sandboxID string, ingressBPS, | |||
return nil | |||
} | |||
|
|||
func (oc *ovsController) getPodDetailsBySandboxID(sandboxID string) (int, string, string, error) { | |||
func (oc *ovsController) getPodDetailsBySandboxID(sandboxID string) (int, net.IP, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if we just stuff the PodIP into external-ids? Then we don't need to dump and parse flows all the time.
(edit: removed bit about getting ofport via ovs-vsctl which was wrong...)
Automatic merge from submit-queue (batch tested with PRs 18737, 18418). Minor cleanups to sdn-cni-plugin Based on belated comments on #18355: > So what if we just stuff the PodIP into external-ids? Then we don't need to dump and parse flows all the time. I had thought about that, but then we'd have to change ovs.Find() to return multiple columns (`ofport` and `external-ids`) and we'd have to parse the external-ids to separate out the sandboxID from the podIP. Though, admittedly, there's already code to parse external-ids in fake_ovs now anyway... Also, we're only parsing a single flow now, because we request "in_port=%d" in the dump-flows. > Any reason to do [hostVeth] in the environment rather than just stuff it into the JSON by extending CNIRequest? I actually did do it that way first, and then changed it. Something about adding a field to CNIRequest had weird unexpected side effects. IIRC, it would have required a whole bunch of changes to the unit tests or something like that. But I can change it if you prefer.
As discussed in #15991, we need to move all operations in the pod's network namespace out of process, due to a golang issue that allows setns() calls in a locked thread to leak into other threads, causing random lossage as operations intended for the main network namespace end up running in other namespaces instead. (This is fixed in golang 1.10 but we need a fix before then.)
Fixes #15991
Fixes #14385
Fixes #13108
Fixes #18317
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1539987