-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdn: add support for multicast traffic for simple and multitenant plugins #12494
Conversation
FWIW, the new multicast related tables look like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work @dcbw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm vaguely concerned that we now end up doing lots of extra potentially-rather-large OVS updates for something that most users aren't going to use at all...
type podManager struct { | ||
// Common stuff used for both live and testing code | ||
podHandler podHandler | ||
cniServer *cniserver.CNIServer | ||
// Request queue for pod operations incoming from the CNIServer | ||
requests chan (*cniserver.PodRequest) | ||
// Tracks pod :: IP address for hostport handling | ||
runningPods map[string]*kubehostport.ActivePod | ||
runningPods map[string]*runningPod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is out of date
if pod := m.runningPods[getPodKey(request)]; pod != nil { | ||
return pod.activePod | ||
} | ||
return nil | ||
} | ||
|
||
// Return a list of Kubernetes RunningPod objects for hostport operations | ||
func (m *podManager) getRunningPods() []*kubehostport.ActivePod { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment was already wrong, and now it's extra wrong. And maybe the function name should change since it's explicitly not returning "runningPod" objects
@@ -76,6 +76,20 @@ func (ovsif *Interface) DeleteBridge() error { | |||
return err | |||
} | |||
|
|||
// GetOFPort returns the OpenFlow port number of a given network interface | |||
// attached to a bridge. | |||
func (ovsif *Interface) GetOFPort(port string) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW #12145 adds a generic Get() although I think there's probably a good argument for having GetOFPort() separate anyway?
if err != nil { | ||
return -1, fmt.Errorf("Could not parse allocated ofport %q: %v", ofportStr, err) | ||
} | ||
ofport, err := ovsif.GetOFPort(port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you ignore this err
// ModifyPort modifies a port on a bridge. (It is an | ||
// error if the interface is not currently a bridge port.) | ||
func (ovsif *Interface) ModifyPort(port string, args ...string) error { | ||
tmp_args := []string{"mod-port", ovsif.bridge, port} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now unused (and I don't think you needed it before anyway; you could have just included the properties in the initial AddPort).
// Multicast coming from the VXLAN | ||
otx.AddFlow("table=30, priority=50, in_port=1, ip, nw_dst=224.0.0.0/3, actions=goto_table:120") | ||
// Multicast coming from local pods | ||
otx.AddFlow("table=30, priority=25, ip, nw_dst=224.0.0.0/3, actions=goto_table:110") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use priority=200 and priority=100 here, like the other blocks in table 30. There's no overlap between the nw_dst values, so that's fine
// Multicast | ||
Dst: net.IPNet{ | ||
IP: mcaddr, | ||
Mask: net.IPMask(mcaddr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets the right answer, but not really for the right reason. It would be nicer to use ParseCIDR() on "224.0.0.0/3" to get an IPNet directly.
(And I assume you've tested that this route is actually needed? The default route isn't good enough?)
func (l runningPodsSlice) Swap(i, j int) { l[i], l[j] = l[j], l[i] } | ||
|
||
// FIXME: instead of calculating all this ourselves, figure out a way to pass | ||
// the old VNID through the Update() call (or get it from somewhere else). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generate the Update calls ourselves (and we always will generate them ourselves, regardless of whether CNI ever gets an UPDATE action, because Kubernetes doesn't know about NetNamespaces), so we can include that information if we want to.
podsByVNID := make(map[uint32]runningPodsSlice) | ||
for key, runningPod := range runningPods { | ||
if key != podKey { | ||
podsByVNID[runningPod.vnid] = append(podsByVNID[runningPod.vnid], runningPod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should keep this around rather than recomputing it every time?
return &runningPod{ | ||
activePod: &kubehostport.ActivePod{ | ||
Pod: pod, | ||
IP: net.ParseIP(podIP), | ||
}, | ||
vnid: podConfig.vnid, | ||
vnid: podConfig.vnid, | ||
ofport: ofport, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other way update() could be handled, so you didn't have to re-call GetOFPort() (and re-parse podIP) given that they're both constant over the life of the pod? You're creating a new runningPod object every time just to deal with the fact that vnid can change, but maybe you should just return a new vnid instead.
@danwinship Do you think we should make this an option that defaults to disabled? |
Yeah, maybe. Although then that leads to the question, are we going to enable it in Online? It would be better if we could autodetect when it was needed somehow. Or maybe we could make it a NetNamespace flag. "oadm pod-network enable-multicast mynamespace"? (And then we'd only have to add the rules for namespaces that were using multicast.) Actually, also, the current code doesn't interact with NetworkPolicy correctly (since you might have policies that allow some pods in a namespace to communicate with each other, but not other pods). Maybe the right long term fix is to say that multicast is disabled in namespaces by default, and can be enabled with specific NetworkPolicies. I don't think this has been discussed upstream at all. |
@danwinship I like the per-namespace or networkpolicy idea. Perhaps for this release we can do it with an annotation on the namespace... it's a little crappy, but if this is for tech preview, then might that suffice? Then in 3.6 we can work out with the sig what the best answer is with NetworkPolicy? Obviously, any docs would have to warn that this will change later and the annotation is only temporary, and may need to be migrated on an upgrade. But I'm not sure if we can get away with that :-) |
Anything that requires admin action is basically a non-starter for
online free tier. We just don't have the admins...
How bad is the impact? Do we have time to measure it before release?
…On Wed, 2017-01-18 at 07:21 -0800, Dan Winship wrote:
Yeah, maybe. Although then that leads to the question, are we going
to enable it in Online?
It would be better if we could autodetect when it was needed somehow.
Or maybe we could make it a NetNamespace flag. "oadm pod-network
enable-multicast mynamespace"? (And then we'd only have to add the
rules for namespaces that were using multicast.)
Actually, also, the current code doesn't interact with NetworkPolicy
correctly (since you might have policies that allow some pods in a
namespace to communicate with each other, but not other pods). Maybe
the right long term fix is to say that multicast is disabled in
namespaces by default, and can be enabled with specific
NetworkPolicies. I don't think this has been discussed upstream at
all.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@eparis: would we even want to allow online free to use multicast? |
Discussed with with online team. free tier wants default disable. If we provide a way that individual users can turn it on themselves, fine. But don't want it on by default... |
OK, annotation support is "known to be slightly buggy" (aka not actually implemented yet), and some of the code will need to be moved around to make this work with the networkpolicy plugin. But that can get fixed next week. [merge] |
|
e768124
to
131a9e3
Compare
[Test]ing while waiting on the merge queue |
[merge] since push after tag |
Last test flaked. re-[test] |
flake #12558, [test] |
131a9e3
to
aabef4b
Compare
Multicast uses a common address space, so we can't match directly on a destination address for each pod, meaning we have to have flows with multiple actions. This could cause scalability problems later, which might be able to be fixed with OpenFlow groups and buckets. Each VNID would be a group, and inside the group each pod would get a bucket. But that's for later. In this PR, multicast from the VXLAN is sent directly to the local delivery table (openshift#120). Local multicast traffic is first sent to VXLAN tunnels (table openshift#110) and then chained to local delivery.
aabef4b
to
7c54ccb
Compare
Evaluated for origin test up to 7c54ccb |
Evaluated for origin merge up to 7c54ccb |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13137/) (Base Commit: 73b73e6) (Image: devenv-rhel7_5745) |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13136/) (Base Commit: 73b73e6) |
What was the end outcome on this - disabled by default? |
Yes, #12650 makes it disabled unless you set an annotation on the NetNamespace |
Multicast pings don't work, but there doesn't seem to be much of a point to that. They could be added easily by:
in the pods.
but other than all that, open for comments on the approach. A refinement may be to use groups and buckets instead of building up flows for every port in a VNID and every node on the VXLAN.
Testing: use https://github.com/troglobit/mcjoin/ and in one pod run "mcjoin -s" and in another run "mcjoin" and you should see the "mcjoin" process printing out '.' every time it receives a multicast packet from the "-s" process.
@openshift/networking