-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdn: garbage-collect dead containers to recover IPAM leases #11964
Conversation
@openshift/networking |
Clearly, this is a hack. The real upstream fix is more like kubernetes/kubernetes#37036 |
return fmt.Errorf("error running network teardown script: %s", getScriptError(out)) | ||
} else if err != nil { | ||
return err | ||
} | ||
} | ||
|
||
if _, err := m.runIPAM(req.Netns, cniserver.CNI_DEL, req.ContainerId); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we're potentially passing an invalid req.Netns to runIPAM.
It looks like CNI_NETNS is optional for CNI_DEL anyway, so how about we don't pass it here? (Likewise in the cleanup code)
|
||
// ipamGarbageCollection will release unused IP. | ||
// openshift-sdn uses the CNI host-local IPAM plugin which stores allocated IPs | ||
// in a file. Each file created under the IPAM directory has the format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh. so is this all going to go away when we get kubernetes/kubernetes#37036? If not, then should it go into the host-local plugin instead? (Particularly so that if we rebase CNI later and they've changed things, we'll get a merge failure rather than a silent breakage.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should all go away. No, it can't go into host-local because host-local has no idea about all the running containers and whether a lease is dead or not. It's just an allocator, it doesn't know anything about containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to track this for the kube merge? I don't see anything in https://github.com/openshift/origin/blob/master/HACKING.md that would mark it for replacement. Do we just set something on the board to remember to fix this at a certain point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "getting the list of containers host-local knows about" part could go into host-local. But if it's going away soon I don't care much
// ipamGarbageCollection will release unused IP. | ||
// openshift-sdn uses the CNI host-local IPAM plugin which stores allocated IPs | ||
// in a file. Each file created under the IPAM directory has the format: | ||
// ip/container-hash. So this routine looks for hashes that are not reported by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Each file created under the IPAM directory has an IP address for its filename and contains a container hash". (or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly change the wording, this is mostly just a port of the corresponding kubenet workaround for this issue. I guess it's not a huge problem to diverge from that commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current wording is just wrong, which then makes it look like the code is wrong since it doesn't correspond to comment.
80caec8
to
011f4ee
Compare
@danwinship fixed up and rebased, PTAL! |
011f4ee
to
189326a
Compare
@@ -213,35 +217,41 @@ func addMacvlan(netns string) error { | |||
|
|||
// Run CNI IPAM for the container, either allocating an IP address and returning | |||
// it (for ADD) or releasing the lease and cleaning up (for DEL) | |||
func (m *podManager) runIPAM(netnsPath string, action cniserver.CNICommand, id string) (*cnitypes.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment now out of date
} | ||
func (m *podManager) ipamAdd(netnsPath string, id string) (*cnitypes.Result, error) { | ||
if netnsPath == "" { | ||
return nil, fmt.Errorf("netns requires for CNI_ADD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requireD
const ipamDir string = "/var/lib/cni/networks/openshift-sdn" | ||
files, err := ioutil.ReadDir(ipamDir) | ||
if err != nil { | ||
glog.Errorf("Failed to list files in CNI host-local IPAM store: %v", ipamDir, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as govet pointed out, there's an arg mismatch here
We don't actually need the netns for IPAM teardown, so don't bother passing it. Also ensure we can release an IPAM allocation even when the container has already died, in case kubelet didn't call the plugin in this case.
Port of kubernetes/kubernetes#35572 to openshift-sdn. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1395183
189326a
to
0d5f8c0
Compare
[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too. What needs to happen when we merge in the upstream version? Do we remember to change this?
I'll file an issue to revert it once upstream lands. EDIT: that issue is #11970 |
[merge] |
Evaluated for origin merge up to 0d5f8c0 |
[test] |
Evaluated for origin test up to 0d5f8c0 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11601/) (Base Commit: 3b2bbe5) (Image: devenv-rhel7_5393) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/11598/) (Base Commit: 3b2bbe5) |
Port of kubernetes/kubernetes#35572 to openshift-sdn.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1395183