Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdn: add support for multicast traffic for simple and multitenant plugins #12494

Merged
merged 3 commits into from
Jan 21, 2017

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Jan 15, 2017

Multicast pings don't work, but there doesn't seem to be much of a point to that. They could be added easily by:

echo "0" > /proc/sys/net/ipv4/icmp_echo_ignore_all
echo "0" > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts 

in the pods.

but other than all that, open for comments on the approach. A refinement may be to use groups and buckets instead of building up flows for every port in a VNID and every node on the VXLAN.

Testing: use https://github.com/troglobit/mcjoin/ and in one pod run "mcjoin -s" and in another run "mcjoin" and you should see the "mcjoin" process printing out '.' every time it receives a multicast packet from the "-s" process.

@openshift/networking

@dcbw
Copy link
Contributor Author

dcbw commented Jan 15, 2017

FWIW, the new multicast related tables look like this:

 cookie=0x0 table=110, ip,nw_dst=224.0.0.0/3 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:172.17.0.3->tun_dst,output:1,set_field:172.17.0.2->tun_dst,output:1,goto_table:120
 cookie=0x0, table=110, priority=0 actions=drop

 cookie=0x0, table=120, priority=0,reg0=0 actions=output:6,output:7
 cookie=0x0, table=120, priority=100,reg0=0xb7e5ea actions=output:3,output:4
 cookie=0x0, table=120, priority=100,reg0=0xefc1e actions=output:5
 cookie=0x0, table=120, priority=0 actions=drop

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work @dcbw.

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm vaguely concerned that we now end up doing lots of extra potentially-rather-large OVS updates for something that most users aren't going to use at all...

type podManager struct {
// Common stuff used for both live and testing code
podHandler podHandler
cniServer *cniserver.CNIServer
// Request queue for pod operations incoming from the CNIServer
requests chan (*cniserver.PodRequest)
// Tracks pod :: IP address for hostport handling
runningPods map[string]*kubehostport.ActivePod
runningPods map[string]*runningPod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment is out of date

if pod := m.runningPods[getPodKey(request)]; pod != nil {
return pod.activePod
}
return nil
}

// Return a list of Kubernetes RunningPod objects for hostport operations
func (m *podManager) getRunningPods() []*kubehostport.ActivePod {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment was already wrong, and now it's extra wrong. And maybe the function name should change since it's explicitly not returning "runningPod" objects

@@ -76,6 +76,20 @@ func (ovsif *Interface) DeleteBridge() error {
return err
}

// GetOFPort returns the OpenFlow port number of a given network interface
// attached to a bridge.
func (ovsif *Interface) GetOFPort(port string) (int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #12145 adds a generic Get() although I think there's probably a good argument for having GetOFPort() separate anyway?

if err != nil {
return -1, fmt.Errorf("Could not parse allocated ofport %q: %v", ofportStr, err)
}
ofport, err := ovsif.GetOFPort(port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you ignore this err

// ModifyPort modifies a port on a bridge. (It is an
// error if the interface is not currently a bridge port.)
func (ovsif *Interface) ModifyPort(port string, args ...string) error {
tmp_args := []string{"mod-port", ovsif.bridge, port}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now unused (and I don't think you needed it before anyway; you could have just included the properties in the initial AddPort).

// Multicast coming from the VXLAN
otx.AddFlow("table=30, priority=50, in_port=1, ip, nw_dst=224.0.0.0/3, actions=goto_table:120")
// Multicast coming from local pods
otx.AddFlow("table=30, priority=25, ip, nw_dst=224.0.0.0/3, actions=goto_table:110")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use priority=200 and priority=100 here, like the other blocks in table 30. There's no overlap between the nw_dst values, so that's fine

// Multicast
Dst: net.IPNet{
IP: mcaddr,
Mask: net.IPMask(mcaddr),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets the right answer, but not really for the right reason. It would be nicer to use ParseCIDR() on "224.0.0.0/3" to get an IPNet directly.

(And I assume you've tested that this route is actually needed? The default route isn't good enough?)

func (l runningPodsSlice) Swap(i, j int) { l[i], l[j] = l[j], l[i] }

// FIXME: instead of calculating all this ourselves, figure out a way to pass
// the old VNID through the Update() call (or get it from somewhere else).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generate the Update calls ourselves (and we always will generate them ourselves, regardless of whether CNI ever gets an UPDATE action, because Kubernetes doesn't know about NetNamespaces), so we can include that information if we want to.

podsByVNID := make(map[uint32]runningPodsSlice)
for key, runningPod := range runningPods {
if key != podKey {
podsByVNID[runningPod.vnid] = append(podsByVNID[runningPod.vnid], runningPod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should keep this around rather than recomputing it every time?

return &runningPod{
activePod: &kubehostport.ActivePod{
Pod: pod,
IP: net.ParseIP(podIP),
},
vnid: podConfig.vnid,
vnid: podConfig.vnid,
ofport: ofport,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other way update() could be handled, so you didn't have to re-call GetOFPort() (and re-parse podIP) given that they're both constant over the life of the pod? You're creating a new runningPod object every time just to deal with the fact that vnid can change, but maybe you should just return a new vnid instead.

@knobunc
Copy link
Contributor

knobunc commented Jan 18, 2017

@danwinship Do you think we should make this an option that defaults to disabled?

@danwinship
Copy link
Contributor

Yeah, maybe. Although then that leads to the question, are we going to enable it in Online?

It would be better if we could autodetect when it was needed somehow. Or maybe we could make it a NetNamespace flag. "oadm pod-network enable-multicast mynamespace"? (And then we'd only have to add the rules for namespaces that were using multicast.)

Actually, also, the current code doesn't interact with NetworkPolicy correctly (since you might have policies that allow some pods in a namespace to communicate with each other, but not other pods). Maybe the right long term fix is to say that multicast is disabled in namespaces by default, and can be enabled with specific NetworkPolicies. I don't think this has been discussed upstream at all.

@knobunc
Copy link
Contributor

knobunc commented Jan 18, 2017

@danwinship I like the per-namespace or networkpolicy idea. Perhaps for this release we can do it with an annotation on the namespace... it's a little crappy, but if this is for tech preview, then might that suffice? Then in 3.6 we can work out with the sig what the best answer is with NetworkPolicy?

Obviously, any docs would have to warn that this will change later and the annotation is only temporary, and may need to be migrated on an upgrade. But I'm not sure if we can get away with that :-)

@eparis
Copy link
Member

eparis commented Jan 18, 2017 via email

@knobunc
Copy link
Contributor

knobunc commented Jan 18, 2017

@eparis: would we even want to allow online free to use multicast?

@eparis
Copy link
Member

eparis commented Jan 20, 2017

Discussed with with online team. free tier wants default disable. If we provide a way that individual users can turn it on themselves, fine. But don't want it on by default...

@danwinship
Copy link
Contributor

OK, annotation support is "known to be slightly buggy" (aka not actually implemented yet), and some of the code will need to be moved around to make this work with the networkpolicy plugin. But that can get fixed next week. [merge]

@eparis
Copy link
Member

eparis commented Jan 20, 2017

In suite "github.com/openshift/origin/pkg/sdn/plugin", test case "TestPodManager" failed:
=== RUN   TestPodManager
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x7ae632]
goroutine 54 [running]:
panic(0x1a1d860, 0xc4200140f0)
        /usr/local/go/src/runtime/panic.go:500 +0x1ae
github.com/openshift/origin/pkg/util/ovs.(*Transaction).AddFlow(0xc42061dbe8, 0xc42028fb80, 0x31, 0xc42061dc48, 0x2, 0x2)
        /go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/util/ovs/ovs.go:160 +0xf2
github.com/openshift/origin/pkg/sdn/plugin.updateMulticastFlows(0xc42013f7d0, 0x0, 0xc4203e6e30, 0xf, 0xc42029bba0, 0x0, 0x0)
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:281 +0xbfc
github.com/openshift/origin/pkg/sdn/plugin.(*podManager).processCNIRequests(0xc4203d2a20)
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:327 +0x6ed
created by github.com/openshift/origin/pkg/sdn/plugin.(*podManager).Start
        github.com/openshift/origin/pkg/sdn/plugin/_test/_obj_test/pod.go:145 +0x69

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@eparis
Copy link
Member

eparis commented Jan 20, 2017

[merge] since push after tag

@knobunc
Copy link
Contributor

knobunc commented Jan 21, 2017

Last test flaked. re-[test]

@danwinship
Copy link
Contributor

flake #12558, [test]

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 21, 2017
dcbw added 2 commits January 21, 2017 07:35
Multicast uses a common address space, so we can't match directly
on a destination address for each pod, meaning we have to have
flows with multiple actions.

This could cause scalability problems later, which might be able
to be fixed with OpenFlow groups and buckets.  Each VNID would
be a group, and inside the group each pod would get a bucket.
But that's for later.

In this PR, multicast from the VXLAN is sent directly to the
local delivery table (openshift#120).  Local multicast traffic is first
sent to VXLAN tunnels (table openshift#110) and then chained to local
delivery.
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 7c54ccb

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 7c54ccb

@openshift-bot
Copy link
Contributor

openshift-bot commented Jan 21, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13137/) (Base Commit: 73b73e6) (Image: devenv-rhel7_5745)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13136/) (Base Commit: 73b73e6)

@openshift-bot openshift-bot merged commit b1ab0b9 into openshift:master Jan 21, 2017
@smarterclayton
Copy link
Contributor

What was the end outcome on this - disabled by default?

@danwinship
Copy link
Contributor

Yes, #12650 makes it disabled unless you set an annotation on the NetNamespace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants