Convert openshift-sdn to a CNI plugin #11082

dcbw · 2016-09-23T21:11:13Z

@danwinship see how you like this one better; the CNI plugin is now essentially a small HTTP client that sends the CNI config and environment over a root-owned unix domain socket to the openshift-node process which serializes all the pod requests and does the actual network setup work. Hostports not implemented yet, but they'll be less icky than the other branch since we can do it all in openshift-node rather than having to work around that race.

dcbw · 2016-09-23T21:11:49Z

@danwinship it's based off your kill-registry branch, obviously... ignore those commits.

danwinship · 2016-09-26T14:26:35Z

I assume the IPAM commit is the same as before?

danwinship · 2016-09-26T14:43:49Z

pkg/sdn/plugin/cniserver.go

+type CniCommand string
+
+const CNI_ADD CniCommand = "ADD"
+const CNI_UPDATE CniCommand = "UPDATE"


Does it still make sense to pipe updates through CNI? In the case of adds and deletes, you have

generic kubelet code → generic CNI code → our CNI plugin → OpenShift-SDN-specific code

but with updates it would go

OpenShift-SDN-specific code → our CNI plugin → OpenShift-SDN-specific code

Even if UPDATE was eventually added to CNI, it still doesn't really make sense, because that would just make it:

OpenShift-SDN-specific code → generic CNI code → our CNI plugin → OpenShift-SDN-specific code

because the updates are triggered by OpenShift-SDN-specific events (eg, NetNamespace changes). So even if we move SDN out of the main openshift/kubelet process, updates will still be implemented in the same process they're triggered from, unlike adds and deletes.

It's mostly to ensure requests are serialized; eg that an UPDATE cannot happen concurrently with an ADD. Does that make more sense?

Hm... actually I think I misunderstood the code before; updates are not actually passing through the openshift-sdn CNI binary, right? It goes watchNetNamespaces() -> node.go:UpdatePod() -> podRequestChan -> watchCni() -> pod.go:updatePod(). OK.

@danwinship correct; UPDATE actions are not passing through the CNI plugin, they are just directly funneled into the CNI event queue on the node process.

danwinship · 2016-09-26T14:47:19Z

pkg/sdn/plugin/cniserver.go

+	namespace string
+	name      string
+	id        string
+	netns     string


oh, this applies retroactively to the IPAM commit too I guess, but we need to be clearer about CNI/kernel "network namespaces" vs OpenShift "network namespaces" in this code

I think all usage of 'namespace' in the IPAM commit is actually the namespace name. The stuff that's kernel-netns related is called "netns".

Changed podRequest struct fields to be clearer.

danwinship · 2016-09-26T14:54:41Z

pkg/sdn/plugin/cniserver.go

+	}
+}
+
+func (s *CniServer) cniDelete(w http.ResponseWriter, r *http.Request) {


This is identical to cniAdd() other than changing "ADD" to "DEL" in a bunch of places

danwinship · 2016-09-26T14:58:59Z

pkg/sdn/plugin/node.go

 	localIP            string
 	localSubnet        *osapi.HostSubnet
 	hostName           string
+	hostSubnet         *osapi.HostSubnet


danwinship · 2016-09-26T14:59:53Z

pkg/sdn/plugin/node.go

 	plugin := &OsdnNode{
 		multitenant:        IsOpenShiftMultitenantNetworkPlugin(pluginName),
 		kClient:            kClient,
 		osClient:           osClient,
+		cniServer:          NewCniServer(podChan),
+		podRequestChan:     podChan,


could just do podRequestChan: make(chan *podRequest)

Then we couldn't pass that channel to NewCniServer() though. It needs to be the same channel, hence why it got broken out.

danwinship · 2016-09-26T15:00:37Z

pkg/sdn/plugin/node.go

 			if err != nil {
+				err = node.UpdatePod(p.Namespace, p.Name, kubeletTypes.ContainerID{ID: containerID})
 				log.Warningf("Could not update pod %q (%s): %s", p.Name, containerID, err)


accidental change?

Yep, fixed.

danwinship · 2016-09-26T15:02:18Z

pkg/sdn/plugin/pod.go

+
+	AssignMacVlanAnnotation string = "pod.network.openshift.io/assign-macvlan"
+
+	interfaceName = knetwork.DefaultInterfaceName


"podInterfaceName"?

danwinship · 2016-09-26T15:04:59Z

pkg/sdn/plugin/pod.go

+	"github.com/containernetworking/cni/pkg/ip"
+	"github.com/containernetworking/cni/pkg/ipam"
+	"github.com/containernetworking/cni/pkg/ns"
+	"github.com/containernetworking/cni/pkg/types"


maybe import as "cnitypes" for clarity

danwinship · 2016-09-26T15:09:07Z

pkg/sdn/plugin/pod.go

+					IP:   net.IPv4zero,
+					Mask: net.IPMask(net.IPv4zero),
+				},
+					GW: netutils.GenerateDefaultGateway(nodeNet)},


This indentation is terrible... gofmt accepts it, but... ugh. There must be some better way.

gofmt also accepts it if you put a newline between "{" and "Dst" and then indents Dst and GW at the same level

Yeah ugly, it was gofmt -s -w's suggestion but I've taken your suggestion about newlines.

stevekuznetsov · 2016-09-27T14:12:21Z

hack/common.sh

+      local -a tests=()
+      for binary in "${binaries[@]}"; do
+        if [[ "${binary}" =~ ".test"$ ]]; then
+          tests+=($binary)


tests+=("${binary}") ^ ^ ^^

stevekuznetsov · 2016-09-27T14:12:53Z

hack/common.sh

+# a for given platform.  First argument is platform, remaining arguments are
+# targets.  Targets can be given as full Go package path or as basenames.
+function os::build::export_targets_and_binaries() {
+  local platform=${1}


We prefer not to brace positional args smaller than 10. We do prefer to quote, though.

stevekuznetsov · 2016-09-27T14:15:05Z

hack/common.sh


+# Generates the set of target platforms to build for. Accepts platforms via
+# OS_BUILD_PLATFORMS, or defaults to the current platform.
+function os::build::export_platforms() {
  platforms=("${OS_BUILD_PLATFORMS[@]:+${OS_BUILD_PLATFORMS[@]}}")


platforms=("${OS_BUILD_PLATFORMS[@]:+"${OS_BUILD_PLATFORMS[@]}"}") ^ ^

stevekuznetsov · 2016-09-27T14:16:11Z

origin.spec

+install -d -m 0755 %{buildroot}/opt/cni/bin
+install -p -m 0755 _build/bin/sdn-cni-plugin %{buildroot}/opt/cni/bin/openshift-sdn
+install -p -m 0755 _build/bin/host-local %{buildroot}/opt/cni/bin
+install -p -m 0755 _build/bin/loopback %{buildroot}/opt/cni/bin


My personal preference is always for the verbose mode specifiers since you can reason about them without memorizing arcane sets of bits.

Yeah, though I opted to use the existing style. Is that OK?

I'll deal with it :)

/me prefers the numbers, so much easier to read :)

I'll blame it on my age

smarterclayton · 2016-09-27T17:07:30Z

pkg/sdn/plugin/node.go

+func (node *OsdnNode) watchCni() {
+	for {
+		select {
+		case request := <-node.podRequestChan:


It looks like this loop can block if any CNI container fails. You'd need to have independent pod queues and handle them separately, and deal with CNI clients that crash and can't receive their updates. It would be better to preserve all of the updates for each pod in a local cache and identify which to serve to the client (the latest).

@smarterclayton did that in latest pushes.

dcbw · 2016-09-28T20:10:55Z

I assume the IPAM commit is the same as before?

Yes, just rebased.

dcbw · 2016-09-30T03:09:46Z

@danwinship @openshift/networking @smarterclayton @stevekuznetsov rebased and fixed up, PTAL thanks!

danwinship · 2016-10-05T17:24:38Z

pkg/sdn/plugin/controller.go

-	}
-
-	config := fmt.Sprintf("export OPENSHIFT_CLUSTER_SUBNET=%s", clusterNetworkCIDR)
-	err = ioutil.WriteFile("/run/openshift-sdn/config.env", []byte(config), 0644)


oh, this file is apparently used by router stuff or something too. @rajatchopra knows I think? (We should probably have a comment explaining that...)

@rajatchopra I don't see any hits in git or google for OPENSHIFT_CLUSTER_SUBNET, where is it used?

danwinship · 2016-10-05T17:26:26Z

pkg/sdn/plugin/controller.go

-	// (This has to have been performed in advance for docker-in-docker deployments,
-	// since this will fail there).
-	_, _ = exec.Command("modprobe", "br_netfilter").CombinedOutput()
-	err = sysctl.SetSysctl("net/bridge/bridge-nf-call-iptables", 0)


Need to drop both references to this in hack/dind-cluster.sh

danwinship · 2016-10-05T18:14:58Z

origin.spec

+install -d -m 0755 %{buildroot}/opt/cni/bin
+install -p -m 0755 _build/bin/sdn-cni-plugin %{buildroot}/opt/cni/bin/openshift-sdn
+install -p -m 0755 _build/bin/host-local %{buildroot}/opt/cni/bin
+install -p -m 0755 _build/bin/loopback %{buildroot}/opt/cni/bin


What is /opt/cni/bin/loopback?

@danwinship it's used by the kubelet CNI driver to configure the loopback interface inside the container. CNI plugins don't have to do that themselves.

danwinship · 2016-10-05T18:17:43Z

pkg/cmd/server/kubernetes/node_config.go

@@ -190,6 +185,23 @@ func BuildKubernetesNodeConfig(options configapi.NodeConfig, enableProxy, enable
 		return nil, err
 	}

+	// Initialize SDN before building kubelet config so it can modify options


"so it can modify options"? how so?

The ones just below:

server.NetworkPluginName = kubeletcni.CNIPluginName server.NetworkPluginDir = kubeletcni.DefaultNetDir server.HairpinMode = componentconfig.HairpinNone server.ConfigureCBR0 = false

which are only configured if the SDN plugin was created.

ok, so the rearrangement from how it was before is not actually needed, it's just aesthetic? I guess the old way keeps all of the server initialization code together, while the new way keeps all the SDN code together...

We want all of the server initialization code together - that's a commonly reviewed bit of code, and if someone comes in here now they will miss it.

Does NewNodePlugin start goroutines? If it does, it cannot be in this method, or you need to split the starting goroutines part into a Start()/Run() method that is called later.

danwinship · 2016-10-05T18:44:16Z

pkg/sdn/plugin/node.go

+		}
+		request := w.requests[0]
+		w.requests = w.requests[1:]
+		w.lock.Unlock()


ugh. no! you're just reinventing channels

If you want to avoid possibly blocking on channel send, you can just do the send from a goroutine.

Also, what is all of this complexity (relative to your original version) protecting against? setupPod/teardownPod/updatePod hanging forever? Would the old code have dealt with that? Can't we just use some sort of timeout+error instead of having multiple workers?

@smarterclayton requested this in #11082 (comment) if I understand his request correctly. And he's at least right from the perspective that if pod setup goes into the weeds for a single pod, that would (in the original push) block any subsequent pods from being created.

Yeah, it kinda reinvents channels, but with two important differences: (1) it's non-blocking and we don't have to specify a channel capacity (after which things would block) and (2) we can cleanly terminate the pod worker when it no longer has work to do which turns out to be pretty racy with a channel. (1) is not much of a problem (though annoying) and (2) is harder to solve, but I'll see if I can come up with something not-ugly.

I fixed this up and it's not as bad as I thought. PTAL?

it's non-blocking

If you don't care that someone is there to get the message (otherwise you'd block, right?) you could always:

select { case myChan <- myVal: case <- time.After(5 * time.Second) }

which turns out to be pretty racy with a channel

On the surface this seems like a misuse of concurrency primitives.

danwinship · 2016-10-05T18:47:27Z

pkg/sdn/plugin/node.go

+		podNamespace: namespace,
+		podName:      name,
+		containerId:  id.String(),
+		netns:        "/blah/something", // plugin doesn't care about namespace


Comment doesn't appear to be true; pod.go:updatePod() calls getVethInfo(req.netns, "eth0") which looks like it will return an error if netns is bogus.

danwinship · 2016-10-05T18:49:14Z

pkg/sdn/plugin/pod.go

+}
+
+// Returns host veth, container veth MAC, and pod IP
+func getVethInfo(netns, containerIfname string) (netlink.Link, string, string, error) {


I'd commented before about how the comment claimed it returned the veth name but it actually returned the whole netlink.Link, and you changed the comment, but I'd meant to change the return value, because all of its callers only use the name, not anything else

dcbw · 2016-10-06T03:29:17Z

@danwinship @smarterclayton PTAL, thanks!

danwinship · 2016-10-07T21:36:01Z

still need to figure out if /run/openshift-sdn/config.env is needed

dcbw · 2016-10-08T14:13:45Z

Rebased and adjusted for dind changes. @rajatchopra do you still need /run/openshift-sdn/config.env for anything in the router?

rajatchopra · 2016-10-08T21:05:47Z

@dcbw yes we still need the config.env for F5 automation scripts for F5 versions less than 12.1

smarterclayton · 2016-10-09T00:28:52Z

pkg/sdn/plugin/node.go

+	return &podWorker{
+		node: node,
+		// Allow queue depth of 10 before blocking additions, if there
+		// are more than 10 outstanding requests something is wrong


How are you going to detect something is wrong?

Completely reworked this code.

smarterclayton · 2016-10-09T00:30:19Z

pkg/sdn/plugin/node.go

+
+func (w *podWorker) addRequest(request *podRequest) {
+	w.requests <- request
+	if len(w.requests) == 1 {


What are you trying to do here? If you're trying to be clever and only run a goroutine while requests are pending that's not really going to make this more efficient. Also, is this called within a lock / single threaded context? If not this code is racy.

Also completely reworked this, so this code is mostly gone.

…9e17a3234142d

There isn't actually an "API" between pkg/sdn/plugin and the rest of OpenShift; the code that starts the master/node/proxy needs to import pkg/sdn/plugin directly anyway, so just drop the api subdir. OTOH, IsOpenShiftNetworkPlugin() and some of our defined constants are used by code that doesn't need access to the internals of the plugin, so move them out to pkg/sdn/api, and update a few places that were redefining the names themselves.

Leave docker configuration alone. Instead, let docker do whatever it wants to do, then pull the interface out of the docker bridge, clear the docker assigned addresses and routes, and allocate an address with the CNI 'host-local' plugin from the node subnet. Note that this commit breaks (a) HostPort functionality (which docker used to manage since it handled IPAM) but that is fixed in the next commit, and (b) a docker-added iptables masquerade rule that is fixed in a later commit as well.

Use the same kubelet network plugin interfaces as everyone else. This converts the openshift-sdn plugin from a plugin compiled into kubelet to one that uses the kubelet CNI driver to call a standard CNI plugin. This plugin sends requests from kubelet over a root-only unix domain socket back to the openshift-sdn node process which handles the actual pod setup/teardown operations. We want to consolidate these operations inside the node process instead of leaving them to the CNI plugin itself because we need to ensure serialized access to OVS, and we need a long-running process to handle HostPort reservation. While we could serialize operations for each pod rather than serializing all pod operations, it turns out to be difficult and error-prone to ensure previous operations complete and those operations can still be GCed in a race-safe manner. General flow: 1) kubelet wants to set up pod networking 2) kubelet calls internal CNI driver 3) CNI driver looks for CNI network config files, finds /etc/cni/net.d/80-openshift-sdn.conf, and calls the /opt/cni/bin/openshift-sdn CNI plugin executable with CNI_COMMAND=ADD 4) openshift-sdn CNI plugin sends environment and stdin to the openshift-node process via HTTP over a root-only unix domain socket 5) openshift-node process sets up pod networking with OVS, veth creation 6) openshift-node process calls the CNI 'host-local' IPAM plugin to allocate an IP address for the pod from the local node subnet 7) openshift-node process returns the IPAM details via HTTP over the unix domain socket to the waiting openshift-sdn CNI plugin 8) openshift-sdn CNI plugin prints IPAM details to stdout 9) kubelet reads IPAM details (or error) and completes pod setup

dcbw · 2016-10-24T04:32:41Z

[test]

eparis · 2016-10-24T04:33:21Z

might need another [merge] I never remember

dcbw · 2016-10-24T04:41:01Z

travis flake:

INFO] Updating /home/travis/gopath/src/github.com/openshift/origin/_output/verify-generated-swagger-spec/api/swagger-spec/oapi-v1.json from https://127.0.0.1:38443/swaggerapi/oapi/v1...

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

[ERROR] PID 23055: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: `curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json"` exited with status 35.

[INFO]      Stack Trace: 

[INFO]        1: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: `curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json"`

[INFO]   Exiting with code 35.

[INFO] Dumping etcd contents to /tmp/openshift/generate-swagger-spec//artifacts/etcd_dump.json

[INFO] Tearing down test

[INFO] Pruning etcd data directory...

/home/travis/gopath/src/github.com/openshift/origin/hack/util.sh: line 393: journalctl: command not found

[INFO] Cleanup complete

[FAIL] !!!!! Generate Failed !!!!

dcbw · 2016-10-24T05:27:02Z

Travis failures seem to consistently be #11517

smarterclayton · 2016-10-24T05:27:34Z

@liggitt the TLS change broke travis curl

openshift-bot · 2016-10-24T09:13:27Z

Evaluated for origin merge up to 82eccb8

openshift-bot · 2016-10-24T09:13:28Z

Evaluated for origin test up to 82eccb8

openshift-bot · 2016-10-24T09:13:36Z

Evaluated for origin testextended up to 82eccb8

openshift-bot · 2016-10-24T09:21:23Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10520/) (Base Commit: 0787d9f) (Image: devenv-rhel7_5234)

openshift-bot · 2016-10-24T10:37:17Z

continuous-integration/openshift-jenkins/testextended SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/665/) (Base Commit: 0787d9f) (Extended Tests: networking)

openshift-bot · 2016-10-24T10:39:59Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10518/) (Base Commit: 0787d9f)

Fixes: openshift#11082

smarterclayton · 2016-10-25T18:14:37Z

Actually...

@liggitt

On Mon, Oct 24, 2016 at 12:41 AM, Dan Williams [email protected]
wrote:

travis flake:

INFO] Updating /home/travis/gopath/src/github.com/openshift/origin/_output/verify-generated-swagger-spec/api/swagger-spec/oapi-v1.json from https://127.0.0.1:38443/swaggerapi/oapi/v1...

% Total % Received % Xferd Average Speed Time Time Time Current
                             Dload  Upload   Total   Spent    Left  Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

[ERROR] PID 23055: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json" exited with status 35.

[INFO] Stack Trace:

[INFO] 1: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json"

[INFO] Exiting with code 35.

[INFO] Dumping etcd contents to /tmp/openshift/generate-swagger-spec//artifacts/etcd_dump.json

[INFO] Tearing down test

[INFO] Pruning etcd data directory...

/home/travis/gopath/src/github.com/openshift/origin/hack/util.sh: line 393: journalctl: command not found

[INFO] Cleanup complete

[FAIL] !!!!! Generate Failed !!!!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11082 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pzYN-z7Z4r37O7DRDNaSMiq99nVxks5q3DbfgaJpZM4KFZTf
.

smarterclayton · 2016-10-25T18:20:32Z

Good old curl

On Mon, Oct 24, 2016 at 1:10 AM, Clayton Coleman [email protected]
wrote:

Actually...

@liggitt

On Mon, Oct 24, 2016 at 12:41 AM, Dan Williams [email protected]
wrote:
travis flake:

INFO] Updating /home/travis/gopath/src/github.com/openshift/origin/_output/verify-generated-swagger-spec/api/swagger-spec/oapi-v1.json from https://127.0.0.1:38443/swaggerapi/oapi/v1...

% Total % Received % Xferd Average Speed Time Time Time Current
                             Dload  Upload   Total   Spent    Left  Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (35) error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

[ERROR] PID 23055: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json" exited with status 35.

[INFO] Stack Trace:

[INFO] 1: /home/travis/gopath/src/github.com/openshift/origin/hack/update-generated-swagger-spec.sh:50: curl -w "\n" "${SWAGGER_API_PATH}${type}/${endpoint}" > "${SWAGGER_SPEC_OUT_DIR}/${type}-${endpoint}.json"

[INFO] Exiting with code 35.

[INFO] Dumping etcd contents to /tmp/openshift/generate-swagger-spec//artifacts/etcd_dump.json

[INFO] Tearing down test

[INFO] Pruning etcd data directory...

/home/travis/gopath/src/github.com/openshift/origin/hack/util.sh: line 393: journalctl: command not found

[INFO] Cleanup complete

[FAIL] !!!!! Generate Failed !!!!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11082 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pzYN-z7Z4r37O7DRDNaSMiq99nVxks5q3DbfgaJpZM4KFZTf
.

Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856

akostadinov · 2016-11-09T09:49:47Z

pkg/sdn/plugin/node_iptables.go

@@ -91,7 +91,7 @@ func (n *NodeIPTables) syncIPTableRules() error {
 // Get openshift iptables rules
 func (n *NodeIPTables) getStaticNodeIPTablesRules() []FirewallRule {
 	return []FirewallRule{
-		{"nat", "POSTROUTING", []string{"-s", n.clusterNetworkCIDR, "!", "-d", n.clusterNetworkCIDR, "-j", "MASQUERADE"}},
+		{"nat", "POSTROUTING", []string{"-s", n.clusterNetworkCIDR, "-j", "MASQUERADE"}},


why did we change this?

Before, when we used docker for IPAM, docker's own iptables rules ended up applying to us, because we were configuring docker to use the same IP range as OpenShift. Since we're no longer reconfiguring docker's networking, we no longer get those rules, so we need to change our rules a bit to compensate.

In particular, we need a rule that says that if iptables sees traffic going from one pod to another pod, then it needs to masquerade it. That's because the only time that pod-to-pod traffic would end up being seen by iptables is if the traffic was originally pod-IP-to-service-IP, and then iptables DNATed the service IP to a pod IP. In that case we need to masquerade, because if we don't, then the destination pod would try to send its response directly back to the source pod, which would happen on the OVS bridge without iptables ever seeing it, and so the DNATing wouldn't get reversed, and so the source pod wouldn't recognize the packet. By masquerading it to the node's IP, we ensure that the service's response gets bounced out of OVS back to where iptables can see it again so it can un-DNAT it before passing it back to the source pod.

Since we already had a rule saying that pod-to-external traffic needs to be masqueraded, and we also need a rule saying that pod-to-pod traffic (that gets seen by iptables) needs to be masqueraded, we can just squash those rules together to get what you see here: any traffic from a pod that ends up being seen by iptables needs to be masqueraded.

Understood, thank you for thorough and clear explanation.

Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856

Due to a misguided attempt to harmonize addresses and routes checking in alreadySetUp(). Turns out addresses can simply be checked for equality since they are returned from GetAddresses() as plain CIDRs, but routes need the extra " " in the check because the entire '/sbin/ip route' line is returned. Fixes: openshift/origin#11082 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1388856

dcbw mentioned this pull request Sep 23, 2016

[DO NOT MERGE] move openshift-sdn pod network setup to a CNI plugin #9981

Closed

dcbw force-pushed the cni-plugin-light branch from 3ce8106 to 9c5061e Compare September 23, 2016 21:18

danwinship suggested changes Sep 26, 2016

View reviewed changes

stevekuznetsov suggested changes Sep 27, 2016

View reviewed changes

smarterclayton reviewed Sep 27, 2016

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 27, 2016

dcbw added the component/networking label Sep 28, 2016

dcbw force-pushed the cni-plugin-light branch from 9c5061e to 676abb2 Compare September 30, 2016 03:09

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 30, 2016

dcbw force-pushed the cni-plugin-light branch from 676abb2 to 91b8e95 Compare September 30, 2016 16:44

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 5, 2016

danwinship suggested changes Oct 5, 2016

View reviewed changes

dcbw force-pushed the cni-plugin-light branch 2 times, most recently from 8b68c68 to 4769ef1 Compare October 6, 2016 02:43

dcbw changed the title ~~[DO NOT MERGE] openshift-sdn CNI plugin (alternate approach)~~ Convert openshift-sdn to a CNI plugin Oct 6, 2016

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2016

danwinship approved these changes Oct 7, 2016

View reviewed changes

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 7, 2016

dcbw force-pushed the cni-plugin-light branch from 4769ef1 to 0f85d8b Compare October 8, 2016 14:12

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2016

smarterclayton reviewed Oct 9, 2016

View reviewed changes

dcbw and others added 3 commits October 23, 2016 23:10

bump(github.com/containernetworking/cni): b8e92ed030588120f9fda47dd35…

ddb5630

…9e17a3234142d

eparis mentioned this pull request Oct 24, 2016

Convert openshift-sdn to a CNI plugin #11515

Closed

dcbw added 2 commits October 23, 2016 23:21

sdn: re-add missing NAT for pods that docker used to do

82eccb8

dcbw force-pushed the cni-plugin-light branch from 9b33704 to 82eccb8 Compare October 24, 2016 04:23

openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2016

openshift-bot merged commit 8c1279a into openshift:master Oct 24, 2016

dcbw added a commit to dcbw/origin that referenced this pull request Oct 24, 2016

specfile: fix specfile issues after openshift-sdn CNI plugin merge

8c51344

Fixes: openshift#11082

dcbw mentioned this pull request Oct 24, 2016

specfile: fix specfile issues after openshift-sdn CNI plugin merge #11531

Merged

danwinship mentioned this pull request Nov 3, 2016

extended networking flake: subnet: Services should serve a basic endpoint from pods #9355

Closed

akostadinov reviewed Nov 9, 2016

View reviewed changes


		AssignMacVlanAnnotation string = "pod.network.openshift.io/assign-macvlan"

		interfaceName = knetwork.DefaultInterfaceName

Convert openshift-sdn to a CNI plugin #11082

Convert openshift-sdn to a CNI plugin #11082

Conversation

dcbw commented Sep 23, 2016

dcbw commented Sep 23, 2016

danwinship commented Sep 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcbw commented Sep 28, 2016

dcbw commented Sep 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcbw commented Oct 6, 2016

danwinship commented Oct 7, 2016

dcbw commented Oct 8, 2016

rajatchopra commented Oct 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcbw commented Oct 24, 2016

eparis commented Oct 24, 2016

dcbw commented Oct 24, 2016

dcbw commented Oct 24, 2016

smarterclayton commented Oct 24, 2016

openshift-bot commented Oct 24, 2016

openshift-bot commented Oct 24, 2016

openshift-bot commented Oct 24, 2016

openshift-bot commented Oct 24, 2016 • edited Loading

openshift-bot commented Oct 24, 2016

openshift-bot commented Oct 24, 2016 •

edited

Loading