-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert openshift-sdn to a CNI plugin #11082
Conversation
@danwinship it's based off your kill-registry branch, obviously... ignore those commits. |
3ce8106
to
9c5061e
Compare
I assume the IPAM commit is the same as before? |
type CniCommand string | ||
|
||
const CNI_ADD CniCommand = "ADD" | ||
const CNI_UPDATE CniCommand = "UPDATE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it still make sense to pipe updates through CNI? In the case of adds and deletes, you have
- generic kubelet code → generic CNI code → our CNI plugin → OpenShift-SDN-specific code
but with updates it would go
- OpenShift-SDN-specific code → our CNI plugin → OpenShift-SDN-specific code
Even if UPDATE was eventually added to CNI, it still doesn't really make sense, because that would just make it:
- OpenShift-SDN-specific code → generic CNI code → our CNI plugin → OpenShift-SDN-specific code
because the updates are triggered by OpenShift-SDN-specific events (eg, NetNamespace changes). So even if we move SDN out of the main openshift/kubelet process, updates will still be implemented in the same process they're triggered from, unlike adds and deletes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly to ensure requests are serialized; eg that an UPDATE cannot happen concurrently with an ADD. Does that make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... actually I think I misunderstood the code before; updates are not actually passing through the openshift-sdn CNI binary, right? It goes watchNetNamespaces() -> node.go:UpdatePod() -> podRequestChan -> watchCni() -> pod.go:updatePod(). OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship correct; UPDATE actions are not passing through the CNI plugin, they are just directly funneled into the CNI event queue on the node process.
namespace string | ||
name string | ||
id string | ||
netns string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this applies retroactively to the IPAM commit too I guess, but we need to be clearer about CNI/kernel "network namespaces" vs OpenShift "network namespaces" in this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all usage of 'namespace' in the IPAM commit is actually the namespace name. The stuff that's kernel-netns related is called "netns".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed podRequest struct fields to be clearer.
} | ||
} | ||
|
||
func (s *CniServer) cniDelete(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical to cniAdd() other than changing "ADD" to "DEL" in a bunch of places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
localIP string | ||
localSubnet *osapi.HostSubnet | ||
hostName string | ||
hostSubnet *osapi.HostSubnet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
plugin := &OsdnNode{ | ||
multitenant: IsOpenShiftMultitenantNetworkPlugin(pluginName), | ||
kClient: kClient, | ||
osClient: osClient, | ||
cniServer: NewCniServer(podChan), | ||
podRequestChan: podChan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just do podRequestChan: make(chan *podRequest)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we couldn't pass that channel to NewCniServer() though. It needs to be the same channel, hence why it got broken out.
if err != nil { | ||
err = node.UpdatePod(p.Namespace, p.Name, kubeletTypes.ContainerID{ID: containerID}) | ||
log.Warningf("Could not update pod %q (%s): %s", p.Name, containerID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidental change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed.
|
||
AssignMacVlanAnnotation string = "pod.network.openshift.io/assign-macvlan" | ||
|
||
interfaceName = knetwork.DefaultInterfaceName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"podInterfaceName"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"github.com/containernetworking/cni/pkg/ip" | ||
"github.com/containernetworking/cni/pkg/ipam" | ||
"github.com/containernetworking/cni/pkg/ns" | ||
"github.com/containernetworking/cni/pkg/types" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe import as "cnitypes" for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
IP: net.IPv4zero, | ||
Mask: net.IPMask(net.IPv4zero), | ||
}, | ||
GW: netutils.GenerateDefaultGateway(nodeNet)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indentation is terrible... gofmt accepts it, but... ugh. There must be some better way.
gofmt also accepts it if you put a newline between "{" and "Dst" and then indents Dst and GW at the same level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ugly, it was gofmt -s -w's suggestion but I've taken your suggestion about newlines.
local -a tests=() | ||
for binary in "${binaries[@]}"; do | ||
if [[ "${binary}" =~ ".test"$ ]]; then | ||
tests+=($binary) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests+=("${binary}")
^ ^ ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
# a for given platform. First argument is platform, remaining arguments are | ||
# targets. Targets can be given as full Go package path or as basenames. | ||
function os::build::export_targets_and_binaries() { | ||
local platform=${1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer not to brace positional args smaller than 10
. We do prefer to quote, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
# Generates the set of target platforms to build for. Accepts platforms via | ||
# OS_BUILD_PLATFORMS, or defaults to the current platform. | ||
function os::build::export_platforms() { | ||
platforms=("${OS_BUILD_PLATFORMS[@]:+${OS_BUILD_PLATFORMS[@]}}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platforms=("${OS_BUILD_PLATFORMS[@]:+"${OS_BUILD_PLATFORMS[@]}"}")
^ ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
install -d -m 0755 %{buildroot}/opt/cni/bin | ||
install -p -m 0755 _build/bin/sdn-cni-plugin %{buildroot}/opt/cni/bin/openshift-sdn | ||
install -p -m 0755 _build/bin/host-local %{buildroot}/opt/cni/bin | ||
install -p -m 0755 _build/bin/loopback %{buildroot}/opt/cni/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal preference is always for the verbose mode specifiers since you can reason about them without memorizing arcane sets of bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, though I opted to use the existing style. Is that OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll deal with it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/me prefers the numbers, so much easier to read :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll blame it on my age
func (node *OsdnNode) watchCni() { | ||
for { | ||
select { | ||
case request := <-node.podRequestChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this loop can block if any CNI container fails. You'd need to have independent pod queues and handle them separately, and deal with CNI clients that crash and can't receive their updates. It would be better to preserve all of the updates for each pod in a local cache and identify which to serve to the client (the latest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton did that in latest pushes.
Yes, just rebased. |
9c5061e
to
676abb2
Compare
@danwinship @openshift/networking @smarterclayton @stevekuznetsov rebased and fixed up, PTAL thanks! |
676abb2
to
91b8e95
Compare
} | ||
|
||
config := fmt.Sprintf("export OPENSHIFT_CLUSTER_SUBNET=%s", clusterNetworkCIDR) | ||
err = ioutil.WriteFile("/run/openshift-sdn/config.env", []byte(config), 0644) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this file is apparently used by router stuff or something too. @rajatchopra knows I think? (We should probably have a comment explaining that...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajatchopra I don't see any hits in git or google for OPENSHIFT_CLUSTER_SUBNET, where is it used?
// (This has to have been performed in advance for docker-in-docker deployments, | ||
// since this will fail there). | ||
_, _ = exec.Command("modprobe", "br_netfilter").CombinedOutput() | ||
err = sysctl.SetSysctl("net/bridge/bridge-nf-call-iptables", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to drop both references to this in hack/dind-cluster.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
install -d -m 0755 %{buildroot}/opt/cni/bin | ||
install -p -m 0755 _build/bin/sdn-cni-plugin %{buildroot}/opt/cni/bin/openshift-sdn | ||
install -p -m 0755 _build/bin/host-local %{buildroot}/opt/cni/bin | ||
install -p -m 0755 _build/bin/loopback %{buildroot}/opt/cni/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is /opt/cni/bin/loopback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship it's used by the kubelet CNI driver to configure the loopback interface inside the container. CNI plugins don't have to do that themselves.
@@ -190,6 +185,23 @@ func BuildKubernetesNodeConfig(options configapi.NodeConfig, enableProxy, enable | |||
return nil, err | |||
} | |||
|
|||
// Initialize SDN before building kubelet config so it can modify options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"so it can modify options"? how so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ones just below:
server.NetworkPluginName = kubeletcni.CNIPluginName
server.NetworkPluginDir = kubeletcni.DefaultNetDir
server.HairpinMode = componentconfig.HairpinNone
server.ConfigureCBR0 = false
which are only configured if the SDN plugin was created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so the rearrangement from how it was before is not actually needed, it's just aesthetic? I guess the old way keeps all of the server
initialization code together, while the new way keeps all the SDN code together...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want all of the server initialization code together - that's a commonly reviewed bit of code, and if someone comes in here now they will miss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does NewNodePlugin start goroutines? If it does, it cannot be in this method, or you need to split the starting goroutines part into a Start()/Run()
method that is called later.
} | ||
request := w.requests[0] | ||
w.requests = w.requests[1:] | ||
w.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh. no! you're just reinventing channels
If you want to avoid possibly blocking on channel send, you can just do the send from a goroutine.
Also, what is all of this complexity (relative to your original version) protecting against? setupPod/teardownPod/updatePod hanging forever? Would the old code have dealt with that? Can't we just use some sort of timeout+error instead of having multiple workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton requested this in #11082 (comment) if I understand his request correctly. And he's at least right from the perspective that if pod setup goes into the weeds for a single pod, that would (in the original push) block any subsequent pods from being created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it kinda reinvents channels, but with two important differences: (1) it's non-blocking and we don't have to specify a channel capacity (after which things would block) and (2) we can cleanly terminate the pod worker when it no longer has work to do which turns out to be pretty racy with a channel. (1) is not much of a problem (though annoying) and (2) is harder to solve, but I'll see if I can come up with something not-ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this up and it's not as bad as I thought. PTAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's non-blocking
If you don't care that someone is there to get the message (otherwise you'd block, right?) you could always:
select {
case myChan <- myVal:
case <- time.After(5 * time.Second)
}
which turns out to be pretty racy with a channel
On the surface this seems like a misuse of concurrency primitives.
podNamespace: namespace, | ||
podName: name, | ||
containerId: id.String(), | ||
netns: "/blah/something", // plugin doesn't care about namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment doesn't appear to be true; pod.go:updatePod() calls getVethInfo(req.netns, "eth0")
which looks like it will return an error if netns is bogus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
|
||
// Returns host veth, container veth MAC, and pod IP | ||
func getVethInfo(netns, containerIfname string) (netlink.Link, string, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd commented before about how the comment claimed it returned the veth name but it actually returned the whole netlink.Link, and you changed the comment, but I'd meant to change the return value, because all of its callers only use the name, not anything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
8b68c68
to
4769ef1
Compare
@danwinship @smarterclayton PTAL, thanks! |
still need to figure out if /run/openshift-sdn/config.env is needed |
Rebased and adjusted for dind changes. @rajatchopra do you still need /run/openshift-sdn/config.env for anything in the router? |
@dcbw yes we still need the config.env for F5 automation scripts for F5 versions less than 12.1 |
return &podWorker{ | ||
node: node, | ||
// Allow queue depth of 10 before blocking additions, if there | ||
// are more than 10 outstanding requests something is wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you going to detect something is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely reworked this code.
|
||
func (w *podWorker) addRequest(request *podRequest) { | ||
w.requests <- request | ||
if len(w.requests) == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you trying to do here? If you're trying to be clever and only run a goroutine while requests are pending that's not really going to make this more efficient. Also, is this called within a lock / single threaded context? If not this code is racy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also completely reworked this, so this code is mostly gone.
There isn't actually an "API" between pkg/sdn/plugin and the rest of OpenShift; the code that starts the master/node/proxy needs to import pkg/sdn/plugin directly anyway, so just drop the api subdir. OTOH, IsOpenShiftNetworkPlugin() and some of our defined constants are used by code that doesn't need access to the internals of the plugin, so move them out to pkg/sdn/api, and update a few places that were redefining the names themselves.
Leave docker configuration alone. Instead, let docker do whatever it wants to do, then pull the interface out of the docker bridge, clear the docker assigned addresses and routes, and allocate an address with the CNI 'host-local' plugin from the node subnet. Note that this commit breaks (a) HostPort functionality (which docker used to manage since it handled IPAM) but that is fixed in the next commit, and (b) a docker-added iptables masquerade rule that is fixed in a later commit as well.
Use the same kubelet network plugin interfaces as everyone else. This converts the openshift-sdn plugin from a plugin compiled into kubelet to one that uses the kubelet CNI driver to call a standard CNI plugin. This plugin sends requests from kubelet over a root-only unix domain socket back to the openshift-sdn node process which handles the actual pod setup/teardown operations. We want to consolidate these operations inside the node process instead of leaving them to the CNI plugin itself because we need to ensure serialized access to OVS, and we need a long-running process to handle HostPort reservation. While we could serialize operations for each pod rather than serializing all pod operations, it turns out to be difficult and error-prone to ensure previous operations complete and those operations can still be GCed in a race-safe manner. General flow: 1) kubelet wants to set up pod networking 2) kubelet calls internal CNI driver 3) CNI driver looks for CNI network config files, finds /etc/cni/net.d/80-openshift-sdn.conf, and calls the /opt/cni/bin/openshift-sdn CNI plugin executable with CNI_COMMAND=ADD 4) openshift-sdn CNI plugin sends environment and stdin to the openshift-node process via HTTP over a root-only unix domain socket 5) openshift-node process sets up pod networking with OVS, veth creation 6) openshift-node process calls the CNI 'host-local' IPAM plugin to allocate an IP address for the pod from the local node subnet 7) openshift-node process returns the IPAM details via HTTP over the unix domain socket to the waiting openshift-sdn CNI plugin 8) openshift-sdn CNI plugin prints IPAM details to stdout 9) kubelet reads IPAM details (or error) and completes pod setup
9b33704
to
82eccb8
Compare
[test] |
might need another [merge] I never remember |
travis flake:
|
Travis failures seem to consistently be #11517 |
@liggitt the TLS change broke travis curl |
Evaluated for origin merge up to 82eccb8 |
Evaluated for origin test up to 82eccb8 |
Evaluated for origin testextended up to 82eccb8 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10520/) (Base Commit: 0787d9f) (Image: devenv-rhel7_5234) |
continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/665/) (Base Commit: 0787d9f) (Extended Tests: networking) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10518/) (Base Commit: 0787d9f) |
Actually... On Mon, Oct 24, 2016 at 12:41 AM, Dan Williams [email protected]
|
Good old curl On Mon, Oct 24, 2016 at 1:10 AM, Clayton Coleman [email protected]
|
Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856
Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856
@@ -91,7 +91,7 @@ func (n *NodeIPTables) syncIPTableRules() error { | |||
// Get openshift iptables rules | |||
func (n *NodeIPTables) getStaticNodeIPTablesRules() []FirewallRule { | |||
return []FirewallRule{ | |||
{"nat", "POSTROUTING", []string{"-s", n.clusterNetworkCIDR, "!", "-d", n.clusterNetworkCIDR, "-j", "MASQUERADE"}}, | |||
{"nat", "POSTROUTING", []string{"-s", n.clusterNetworkCIDR, "-j", "MASQUERADE"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, when we used docker for IPAM, docker's own iptables rules ended up applying to us, because we were configuring docker to use the same IP range as OpenShift. Since we're no longer reconfiguring docker's networking, we no longer get those rules, so we need to change our rules a bit to compensate.
In particular, we need a rule that says that if iptables sees traffic going from one pod to another pod, then it needs to masquerade it. That's because the only time that pod-to-pod traffic would end up being seen by iptables is if the traffic was originally pod-IP-to-service-IP, and then iptables DNATed the service IP to a pod IP. In that case we need to masquerade, because if we don't, then the destination pod would try to send its response directly back to the source pod, which would happen on the OVS bridge without iptables ever seeing it, and so the DNATing wouldn't get reversed, and so the source pod wouldn't recognize the packet. By masquerading it to the node's IP, we ensure that the service's response gets bounced out of OVS back to where iptables can see it again so it can un-DNAT it before passing it back to the source pod.
Since we already had a rule saying that pod-to-external traffic needs to be masqueraded, and we also need a rule saying that pod-to-pod traffic (that gets seen by iptables) needs to be masqueraded, we can just squash those rules together to get what you see here: any traffic from a pod that ends up being seen by iptables needs to be masqueraded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thank you for thorough and clear explanation.
Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856
Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift/origin#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856
@danwinship see how you like this one better; the CNI plugin is now essentially a small HTTP client that sends the CNI config and environment over a root-owned unix domain socket to the openshift-node process which serializes all the pod requests and does the actual network setup work. Hostports not implemented yet, but they'll be less icky than the other branch since we can do it all in openshift-node rather than having to work around that race.