sdn: add support for multicast traffic for simple and multitenant plugins #12494

dcbw · 2017-01-15T02:23:19Z

Multicast pings don't work, but there doesn't seem to be much of a point to that. They could be added easily by:

echo "0" > /proc/sys/net/ipv4/icmp_echo_ignore_all
echo "0" > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts

in the pods.

but other than all that, open for comments on the approach. A refinement may be to use groups and buckets instead of building up flows for every port in a VNID and every node on the VXLAN.

Testing: use https://github.com/troglobit/mcjoin/ and in one pod run "mcjoin -s" and in another run "mcjoin" and you should see the "mcjoin" process printing out '.' every time it receives a multicast packet from the "-s" process.

@openshift/networking

dcbw · 2017-01-15T02:33:32Z

FWIW, the new multicast related tables look like this:

 cookie=0x0 table=110, ip,nw_dst=224.0.0.0/3 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1,set_field:172.17.0.2->tun_dst,output:1,goto_table:120
 cookie=0x0, table=110, priority=0 actions=drop

 cookie=0x0, table=120, priority=0,reg0=0 actions=output:6,output:7
 cookie=0x0, table=120, priority=100,reg0=0xb7e5ea actions=output:3,output:4
 cookie=0x0, table=120, priority=100,reg0=0xefc1e actions=output:5
 cookie=0x0, table=120, priority=0 actions=drop

knobunc

LGTM. Nice work @dcbw.

danwinship

I'm vaguely concerned that we now end up doing lots of extra potentially-rather-large OVS updates for something that most users aren't going to use at all...

danwinship · 2017-01-17T20:42:06Z

pkg/sdn/plugin/pod.go

 type podManager struct {
 	// Common stuff used for both live and testing code
 	podHandler podHandler
 	cniServer  *cniserver.CNIServer
 	// Request queue for pod operations incoming from the CNIServer
 	requests chan (*cniserver.PodRequest)
 	// Tracks pod :: IP address for hostport handling
-	runningPods map[string]*kubehostport.ActivePod
+	runningPods map[string]*runningPod


comment is out of date

danwinship · 2017-01-17T20:42:31Z

pkg/sdn/plugin/pod.go

+	if pod := m.runningPods[getPodKey(request)]; pod != nil {
+		return pod.activePod
+	}
+	return nil
 }

 // Return a list of Kubernetes RunningPod objects for hostport operations
 func (m *podManager) getRunningPods() []*kubehostport.ActivePod {


comment was already wrong, and now it's extra wrong. And maybe the function name should change since it's explicitly not returning "runningPod" objects

danwinship · 2017-01-17T20:45:53Z

pkg/util/ovs/ovs.go

@@ -76,6 +76,20 @@ func (ovsif *Interface) DeleteBridge() error {
 	return err
 }

+// GetOFPort returns the OpenFlow port number of a given network interface
+// attached to a bridge.
+func (ovsif *Interface) GetOFPort(port string) (int, error) {


FWIW #12145 adds a generic Get() although I think there's probably a good argument for having GetOFPort() separate anyway?

danwinship · 2017-01-17T20:46:40Z

pkg/util/ovs/ovs.go

-	if err != nil {
-		return -1, fmt.Errorf("Could not parse allocated ofport %q: %v", ofportStr, err)
-	}
+	ofport, err := ovsif.GetOFPort(port)


you ignore this err

danwinship · 2017-01-17T20:49:13Z

pkg/util/ovs/ovs.go

+// ModifyPort modifies a port on a bridge. (It is an
+// error if the interface is not currently a bridge port.)
+func (ovsif *Interface) ModifyPort(port string, args ...string) error {
+	tmp_args := []string{"mod-port", ovsif.bridge, port}


This is now unused (and I don't think you needed it before anyway; you could have just included the properties in the initial AddPort).

danwinship · 2017-01-17T21:10:54Z

pkg/sdn/plugin/controller.go

+	// Multicast coming from the VXLAN
+	otx.AddFlow("table=30, priority=50, in_port=1, ip, nw_dst=224.0.0.0/3, actions=goto_table:120")
+	// Multicast coming from local pods
+	otx.AddFlow("table=30, priority=25, ip, nw_dst=224.0.0.0/3, actions=goto_table:110")


You can use priority=200 and priority=100 here, like the other blocks in table 30. There's no overlap between the nw_dst values, so that's fine

danwinship · 2017-01-17T21:19:39Z

pkg/sdn/plugin/pod.go

+					// Multicast
+					Dst: net.IPNet{
+						IP:   mcaddr,
+						Mask: net.IPMask(mcaddr),


This gets the right answer, but not really for the right reason. It would be nicer to use ParseCIDR() on "224.0.0.0/3" to get an IPNet directly.

(And I assume you've tested that this route is actually needed? The default route isn't good enough?)

danwinship · 2017-01-17T21:23:04Z

pkg/sdn/plugin/pod.go

+func (l runningPodsSlice) Swap(i, j int)      { l[i], l[j] = l[j], l[i] }
+
+// FIXME: instead of calculating all this ourselves, figure out a way to pass
+// the old VNID through the Update() call (or get it from somewhere else).


We generate the Update calls ourselves (and we always will generate them ourselves, regardless of whether CNI ever gets an UPDATE action, because Kubernetes doesn't know about NetNamespaces), so we can include that information if we want to.

danwinship · 2017-01-17T21:27:34Z

pkg/sdn/plugin/pod.go

+	podsByVNID := make(map[uint32]runningPodsSlice)
+	for key, runningPod := range runningPods {
+		if key != podKey {
+			podsByVNID[runningPod.vnid] = append(podsByVNID[runningPod.vnid], runningPod)


maybe we should keep this around rather than recomputing it every time?

danwinship · 2017-01-17T21:33:42Z

pkg/sdn/plugin/pod_linux.go

 	return &runningPod{
 		activePod: &kubehostport.ActivePod{
 			Pod: pod,
 			IP:  net.ParseIP(podIP),
 		},
-		vnid: podConfig.vnid,
+		vnid:   podConfig.vnid,
+		ofport: ofport,


Is there some other way update() could be handled, so you didn't have to re-call GetOFPort() (and re-parse podIP) given that they're both constant over the life of the pod? You're creating a new runningPod object every time just to deal with the fact that vnid can change, but maybe you should just return a new vnid instead.

knobunc · 2017-01-18T15:06:19Z

@danwinship Do you think we should make this an option that defaults to disabled?

danwinship · 2017-01-18T15:21:50Z

Yeah, maybe. Although then that leads to the question, are we going to enable it in Online?

It would be better if we could autodetect when it was needed somehow. Or maybe we could make it a NetNamespace flag. "oadm pod-network enable-multicast mynamespace"? (And then we'd only have to add the rules for namespaces that were using multicast.)

Actually, also, the current code doesn't interact with NetworkPolicy correctly (since you might have policies that allow some pods in a namespace to communicate with each other, but not other pods). Maybe the right long term fix is to say that multicast is disabled in namespaces by default, and can be enabled with specific NetworkPolicies. I don't think this has been discussed upstream at all.

knobunc · 2017-01-18T15:35:39Z

@danwinship I like the per-namespace or networkpolicy idea. Perhaps for this release we can do it with an annotation on the namespace... it's a little crappy, but if this is for tech preview, then might that suffice? Then in 3.6 we can work out with the sig what the best answer is with NetworkPolicy?

Obviously, any docs would have to warn that this will change later and the annotation is only temporary, and may need to be migrated on an upgrade. But I'm not sure if we can get away with that :-)

eparis · 2017-01-18T15:40:52Z

Anything that requires admin action is basically a non-starter for online free tier. We just don't have the admins... How bad is the impact? Do we have time to measure it before release?

…

On Wed, 2017-01-18 at 07:21 -0800, Dan Winship wrote: Yeah, maybe. Although then that leads to the question, are we going to enable it in Online? It would be better if we could autodetect when it was needed somehow. Or maybe we could make it a NetNamespace flag. "oadm pod-network enable-multicast mynamespace"? (And then we'd only have to add the rules for namespaces that were using multicast.) Actually, also, the current code doesn't interact with NetworkPolicy correctly (since you might have policies that allow some pods in a namespace to communicate with each other, but not other pods). Maybe the right long term fix is to say that multicast is disabled in namespaces by default, and can be enabled with specific NetworkPolicies. I don't think this has been discussed upstream at all. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

knobunc · 2017-01-18T16:00:52Z

@eparis: would we even want to allow online free to use multicast?

eparis · 2017-01-20T15:51:37Z

Discussed with with online team. free tier wants default disable. If we provide a way that individual users can turn it on themselves, fine. But don't want it on by default...

danwinship · 2017-01-20T19:52:28Z

OK, annotation support is "known to be slightly buggy" (aka not actually implemented yet), and some of the code will need to be moved around to make this work with the networkpolicy plugin. But that can get fixed next week. [merge]

eparis · 2017-01-20T22:16:26Z

In suite "github.com/openshift/origin/pkg/sdn/plugin", test case "TestPodManager" failed:
=== RUN   TestPodManager
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7ae632]
goroutine 54 [running]:
panic(0x1a1d860, 0xc4200140f0)
        /usr/local/go/src/runtime/panic.go:500 +0x1ae
github.com/openshift/origin/pkg/util/ovs.(*Transaction).AddFlow(0xc42061dbe8, 0xc42028fb80, 0x31, 0xc42061dc48, 0x2, 0x2)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/util/ovs/ovs.go:160 +0xf2
github.com/openshift/origin/pkg/sdn/plugin.updateMulticastFlows(0xc42013f7d0, 0x0, 0xc4203e6e30, 0xf, 0xc42029bba0, 0x0, 0x0)
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:281 +0xbfc
github.com/openshift/origin/pkg/sdn/plugin.(*podManager).processCNIRequests(0xc4203d2a20)
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:327 +0x6ed
created by github.com/openshift/origin/pkg/sdn/plugin.(*podManager).Start
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:145 +0x69

openshift-bot · 2017-01-20T22:37:34Z

[Test]ing while waiting on the merge queue

eparis · 2017-01-20T22:53:24Z

[merge] since push after tag

knobunc · 2017-01-21T01:30:37Z

Last test flaked. re-[test]

danwinship · 2017-01-21T01:34:45Z

flake #12558, [test]

Multicast uses a common address space, so we can't match directly on a destination address for each pod, meaning we have to have flows with multiple actions. This could cause scalability problems later, which might be able to be fixed with OpenFlow groups and buckets. Each VNID would be a group, and inside the group each pod would get a bucket. But that's for later. In this PR, multicast from the VXLAN is sent directly to the local delivery table (openshift#120). Local multicast traffic is first sent to VXLAN tunnels (table openshift#110) and then chained to local delivery.

openshift-bot · 2017-01-21T12:37:24Z

Evaluated for origin test up to 7c54ccb

openshift-bot · 2017-01-21T12:45:26Z

Evaluated for origin merge up to 7c54ccb

openshift-bot · 2017-01-21T12:45:38Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13137/) (Base Commit: 73b73e6) (Image: devenv-rhel7_5745)

openshift-bot · 2017-01-21T13:55:50Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13136/) (Base Commit: 73b73e6)

smarterclayton · 2017-02-20T20:07:27Z

What was the end outcome on this - disabled by default?

danwinship · 2017-02-20T22:12:12Z

Yes, #12650 makes it disabled unless you set an annotation on the NetNamespace

dcbw force-pushed the sdn-multicast branch from 34b828f to 782e73b Compare January 15, 2017 02:25

dcbw force-pushed the sdn-multicast branch from 782e73b to e768124 Compare January 17, 2017 18:19

knobunc approved these changes Jan 17, 2017

View reviewed changes

danwinship reviewed Jan 17, 2017

View reviewed changes

danwinship force-pushed the sdn-multicast branch from e768124 to 131a9e3 Compare January 20, 2017 22:35

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2017

sdn: track running pod VNIDs too

800db1c

danwinship force-pushed the sdn-multicast branch from 131a9e3 to aabef4b Compare January 21, 2017 12:35

dcbw added 2 commits January 21, 2017 07:35

util/ovs: add GetOFPort()

b7c4f55

danwinship force-pushed the sdn-multicast branch from aabef4b to 7c54ccb Compare January 21, 2017 12:35

openshift-bot merged commit b1ab0b9 into openshift:master Jan 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdn: add support for multicast traffic for simple and multitenant plugins #12494

sdn: add support for multicast traffic for simple and multitenant plugins #12494

dcbw commented Jan 15, 2017 •

edited

Loading

dcbw commented Jan 15, 2017 •

edited

Loading

knobunc left a comment

danwinship left a comment

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

danwinship Jan 17, 2017

knobunc commented Jan 18, 2017

danwinship commented Jan 18, 2017

knobunc commented Jan 18, 2017

eparis commented Jan 18, 2017 via email

knobunc commented Jan 18, 2017

eparis commented Jan 20, 2017

danwinship commented Jan 20, 2017

eparis commented Jan 20, 2017

openshift-bot commented Jan 20, 2017

eparis commented Jan 20, 2017

knobunc commented Jan 21, 2017

danwinship commented Jan 21, 2017

openshift-bot commented Jan 21, 2017

openshift-bot commented Jan 21, 2017

openshift-bot commented Jan 21, 2017 •

edited

Loading

openshift-bot commented Jan 21, 2017

smarterclayton commented Feb 20, 2017

danwinship commented Feb 20, 2017

sdn: add support for multicast traffic for simple and multitenant plugins #12494

sdn: add support for multicast traffic for simple and multitenant plugins #12494

Conversation

dcbw commented Jan 15, 2017 • edited Loading

dcbw commented Jan 15, 2017 • edited Loading

knobunc left a comment

Choose a reason for hiding this comment

danwinship left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knobunc commented Jan 18, 2017

danwinship commented Jan 18, 2017

knobunc commented Jan 18, 2017

eparis commented Jan 18, 2017 via email

knobunc commented Jan 18, 2017

eparis commented Jan 20, 2017

danwinship commented Jan 20, 2017

eparis commented Jan 20, 2017

openshift-bot commented Jan 20, 2017

eparis commented Jan 20, 2017

knobunc commented Jan 21, 2017

danwinship commented Jan 21, 2017

openshift-bot commented Jan 21, 2017

openshift-bot commented Jan 21, 2017

openshift-bot commented Jan 21, 2017 • edited Loading

openshift-bot commented Jan 21, 2017

smarterclayton commented Feb 20, 2017

danwinship commented Feb 20, 2017

dcbw commented Jan 15, 2017 •

edited

Loading

dcbw commented Jan 15, 2017 •

edited

Loading

openshift-bot commented Jan 21, 2017 •

edited

Loading