-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
f5 vxlan integration with sdn #11181
Conversation
[test] |
1 similar comment
[test] |
f93a55e
to
3ad39a5
Compare
FYI in the policy section of the code the use of the Legacy flag will only work in TMOS version 12.1 or newer. Ideally have a version check to only add the flag if TMOS >= 12.1
Python example of version check here: https://github.com/f5devcentral/f5-icontrol-codeshare-python/blob/master/kubernetes-example/k8s2f5.py#L328 |
The auth code looks to only support basicauth. This will limit authentication to local accounts and not any remote (radius/tacacs), etc... https://github.com/rajatchopra/origin/blob/3ad39a5c60f344d8fa6cb673f021effe9e139c4b/pkg/router/f5/f5.go#L413
Python example: https://github.com/F5Networks/f5-icontrol-rest-python/blob/5c43b0f6424a5f495c50fd2a318e172d5c3d7e37/icontrol/authtoken.py |
Doing a quick scan of the code I'm not seeing an explicit save of the running config after it is generated. F5Networks/f5-common-python#342 iControl REST sits in front of tmsh "When you use the Traffic Management Shell (tmsh) to configure the system, you must explicitly issue a save command to store the configuration data that you have generated. Otherwise, the newly-generated configuration data is not actually stored on the system. For more information about tmsh, see the Traffic Management Shell (tmsh) Reference Guide." |
You can save the running config by issuing a POST /mgmt/tm/sys/config with the following json payload. |
func checkIPAndGetMac(ipStr string) (string, error) { | ||
ip := net.ParseIP(ipStr) | ||
if ip == nil { | ||
errStr := fmt.Sprintf("'%s' is not a valid IP address for a vtep", ipStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the vtep has other restrictions. Perhaps:
'vtep IP '%s' is not a valid IP address'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code structure LGTM... I have no idea if what it is doing with the F5 is right though :-)
Are there any tests we can add? Perhaps at least to exercise the vtep MAC generation function?
glog.V(4).Infof("F5 initialization is complete.") | ||
|
||
return nil | ||
} | ||
|
||
func checkIPAndGetMac(ipStr string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have vtep in it? Or is the package scope enough to indicate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its just a util function. The package scope is enough I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing this function name to be more descriptive (e.g. convertIpToVtepMac).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consider adding a unit test.
52b068c
to
0ac78a7
Compare
Repo problems re-[test] |
@smarterclayton PTAL. The tests are still being worked on, but can you please look over the code to see if it is good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much in the way of test changes that would target the new functionality. Is that an oversight or is qa intended to validate this feature post-merge?
@@ -87,10 +91,18 @@ that you must have a cluster\-wide administrative role to view all namespaces. | |||
The path to the F5 BIG\-IP SSH private key file | |||
|
|||
.PP | |||
\fB\-\-f5\-setup\-osdn\-vxlan\fP=false | |||
Flag to enable VxLAN setup for integration openshif\-sdn with the F5 BIG\-IP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openshif -> openshift
nodeEventQueue := oscache.NewEventQueue(cache.MetaNamespaceKeyFunc) | ||
cache.NewReflector(&nodeLW{ | ||
client: factory.NClient, | ||
field: factory.Fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it makes sense to apply the same filtering to nodes as is being applied to routes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno - would it work if you only have a subset of nodes added to the F5 config and the endpoints for a service are on nodes that are not part of the set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no harm in keeping the fields/filtering. Possible use case with FDB being populated for only a portion of the cluster (say a datacenter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue with supporting filtering on the nodes. I was asking whether it made sense for the same filtering parameters to be applied to both nodes and routes. The current approach of reusing the same fields and labels to filter both would preclude filtering one but not the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to just apply field labeling to nodes - we should never apply the same field selector to two types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be watching nodes unless the underlying router is using it. This loop should be optional and the factory must explicitly choose to use it.
@@ -256,3 +275,23 @@ func (lw *endpointsLW) Watch(options kapi.ListOptions) (watch.Interface, error) | |||
} | |||
return lw.client.Endpoints(lw.namespace).Watch(opts) | |||
} | |||
|
|||
// nodeLW is a list watcher for routes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
routes -> nodes
UsePmtu: "enabled", | ||
} | ||
err = f5.post(url, tunnelPayload, nil) | ||
if err != nil && err.(F5Error).httpStatusCode != 409 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining 409 as a constant whose name is self-documenting (e.g. STATUS_ALREADY_EXISTS) and avoid having to comment as to why 409 is special.
} | ||
// create the net self IP | ||
err = f5.post(selfUrl, netSelfPayload, nil) | ||
if err != nil && err.(F5Error).httpStatusCode != 409 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto re: named constant
glog.V(4).Infof("F5 initialization is complete.") | ||
|
||
return nil | ||
} | ||
|
||
func checkIPAndGetMac(ipStr string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider changing this function name to be more descriptive (e.g. convertIpToVtepMac).
glog.Warningf("Error in obtaining IP address of newly added node %s - %v", node.Name, err) | ||
return nil | ||
} | ||
p.F5Client.AddVtep(ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this call be returning the error that is potentially returned by AddVtep()?
glog.Warningf("Error in obtaining IP address of deleted node %s - %v", node.Name, err) | ||
return nil | ||
} | ||
p.F5Client.RemoveVtep(ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this call be returning the error that is potentially returned by RemoveVtep?
} | ||
p.F5Client.RemoveVtep(ip) | ||
case watch.Modified: | ||
// ignore the modified event. Change in IP address of the node is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the implication of not supporting node address changes? Is it just going to be documented? Should there be logging to diagnose issues if/when address changes occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question re: changes and asked Rajat on this. He said on the openshift-sdn side we don't support IP address changes for a node.
glog.V(4).Infof("F5 initialization is complete.") | ||
|
||
return nil | ||
} | ||
|
||
func checkIPAndGetMac(ipStr string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, consider adding a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a question on the SetupOSDNVxLAN parameter if we can define it implicitly based on the internal ip and cidr, some comments re: constants and todos.
And a note on HandleNode for the host_admitter code in PR #11201
@@ -88,6 +103,9 @@ func (o *F5Router) Bind(flag *pflag.FlagSet) { | |||
flag.StringVar(&o.PrivateKey, "f5-private-key", util.Env("ROUTER_EXTERNAL_HOST_PRIVKEY", ""), "The path to the F5 BIG-IP SSH private key file") | |||
flag.BoolVar(&o.Insecure, "f5-insecure", util.Env("ROUTER_EXTERNAL_HOST_INSECURE", "") == "true", "Skip strict certificate verification") | |||
flag.StringVar(&o.PartitionPath, "f5-partition-path", util.Env("ROUTER_EXTERNAL_HOST_PARTITION_PATH", f5plugin.F5DefaultPartitionPath), "The F5 BIG-IP partition path to use") | |||
flag.StringVar(&o.InternalAddress, "f5-internal-address", util.Env("ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS", ""), "The F5 BIG-IP internal interface's IP address") | |||
flag.StringVar(&o.VxlanGateway, "f5-vxlan-gateway-cidr", util.Env("ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR", ""), "The F5 BIG-IP gateway-ip-address/cidr-mask for setting up the VxLAN") | |||
flag.BoolVar(&o.SetupOSDNVxLAN, "f5-setup-osdn-vxlan", false, "Flag to enable VxLAN setup for integration openshif-sdn with the F5 BIG-IP") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this needs to be set with a command line arg option? It maybe better to implicitly set it if f5-internal-address
and f5-vxlan-gateway-cidr
is set - basically one without the other(s) doesn't make sense does it?
@@ -108,6 +126,10 @@ func (o *F5Router) Validate() error { | |||
return errors.New("F5 HTTP and HTTPS vservers cannot both be blank") | |||
} | |||
|
|||
if o.SetupOSDNVxLAN && (len(o.VxlanGateway) == 0 || len(o.InternalAddress) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment re: o.SetupOSDNVxLAN
.
@@ -51,6 +54,8 @@ func (c *RouterController) Run() { | |||
} | |||
go utilwait.Forever(c.HandleRoute, 0) | |||
go utilwait.Forever(c.HandleEndpoints, 0) | |||
c.HandleNode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? The one from the goroutine ahead should be called soon anyway.
return &RouterControllerFactory{ | ||
KClient: kc, | ||
OSClient: oc, | ||
NClient: kc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if its same as kc - we should cleanup to just use 1 ?Client
. Maybe cleanup in follow on work. I know changing it would touch a lot more code in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment/todo here or in the type definition would be useful.
func (p *UniqueHost) HandleNode(eventType watch.EventType, node *kapi.Node) error { | ||
return p.plugin.HandleNode(eventType, node) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, depending on which PR merges first - #11201 - will need to add HandleNode to the host_admitter
plugin.
} | ||
p.F5Client.RemoveVtep(ip) | ||
case watch.Modified: | ||
// ignore the modified event. Change in IP address of the node is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question re: changes and asked Rajat on this. He said on the openshift-sdn side we don't support IP address changes for a node.
@@ -120,6 +123,8 @@ type f5Policy struct { | |||
// connections when more than one rule matches. Typically we use best-match; | |||
// other possible values are all-match and first-match. | |||
Strategy string `json:"strategy"` | |||
|
|||
Legacy bool `json:"legacy"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious on this one - thanks for explaining it. So if we ever decide to lock down to a specific version of the F5
api, then we would have to remove this - maybe adding a comment here would be helpful. Thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc
@@ -106,6 +106,9 @@ type f5Policy struct { | |||
// Name is the name of the policy. | |||
Name string `json:"name"` | |||
|
|||
// Partition name for the policy | |||
TmPartition string `json:"tmPartition"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Felt a little weird that this one is called TmPartition
but every other F5 api call payload uses Partition
. Thanks for explaining that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the godoc explaining what TmPartition means or be more explicit
LocalAddress: f5.internalAddress, | ||
Mode: "bidirectional", | ||
Mtu: "0", | ||
Profile: path.Join(f5.partitionPath, "vxlan-ose"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant for vxlan-ose
.
nodeEventQueue := oscache.NewEventQueue(cache.MetaNamespaceKeyFunc) | ||
cache.NewReflector(&nodeLW{ | ||
client: factory.NClient, | ||
field: factory.Fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno - would it work if you only have a subset of nodes added to the F5 config and the endpoints for a service are on nodes that are not part of the set?
LGTM |
networking job flaked with #11452 re-[test] |
@@ -51,6 +52,7 @@ func (c *RouterController) Run() { | |||
} | |||
go utilwait.Forever(c.HandleRoute, 0) | |||
go utilwait.Forever(c.HandleEndpoints, 0) | |||
go utilwait.Forever(c.HandleNode, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this always on if it's only used in F5?
@@ -27,6 +27,7 @@ import ( | |||
type RouterControllerFactory struct { | |||
KClient kclient.EndpointsNamespacer | |||
OSClient osclient.RoutesNamespacer | |||
NClient kclient.NodesInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NamespaceClient, abbreviations were a special case but shouldn't be propogared.
@@ -1776,6 +1776,14 @@ items: | |||
- "" | |||
attributeRestrictions: null | |||
resources: | |||
- nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty big security expansion - @deads2k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton didn't you want this role bindable by project-admins? This will prevent that from being possible.
As for our system router, it doesn't worry overly much, but for per-project routers, this makes it impossible to create those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also required. If we expect the F5 to participate in the SDN it needs to know about the nodes its talking. We don't have OpenShift-sdn doing that. If this is a big problem it would mean splitting the f5 into its own role, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its also required. If we expect the F5 to participate in the SDN it needs to know about the nodes its talking. We don't have OpenShift-sdn doing that. If this is a big problem it would mean splitting the f5 into its own role, I guess?
Or adding it as the delta required.
Removing from merge queue until my comments are addressed. |
41d5983
to
1d695d3
Compare
@smarterclayton I have addressed all your feedback. Primarily two:
Marking it [merge]. Please feel free to revert the commit if anything is loose. TODO: |
[merge] you stoopid flake. Merge! |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10513/) (Image: devenv-rhel7_5234) |
I'm an idiot, it wasn't a flake!
|
Evaluated for origin merge up to 96354f7 |
Evaluated for origin test up to 96354f7 |
back in at #5. wait and see! |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10513/) (Base Commit: 35b539d) |
As David noted this causes a security vulnerability because it's escalating the privileges granted to system:router. I'm going to push a change that removes the node permission granted here and to fix the issue you can open another one (instead of reverting here). |
Opened a PR that removes the permission, we can iterate on the next solution at a lower pace. |
It's not a flake. The tests don't compile.
|
We could also create system:local-router or system:personal-router. On Mon, Oct 24, 2016 at 9:14 AM, David Eads [email protected]
|
You mean why I said |
Integration tests aren't compiling On Oct 23, 2016, at 4:45 PM, OpenShift Bot [email protected] wrote: continuous-integration/openshift-jenkins/merge FAILURE ( — |
The least disruptive thing is to create a new role with the additional So create system:router-.... (david i need a name suggestion) that contains On Mon, Oct 24, 2016 at 10:11 AM, Clayton Coleman [email protected]
|
GitHub is delivering emails I sent on Friday. Thanks github. |
|
This PR implements the VxLAN integration of F5 with a vxlan based overlay. Three pieces of work primarily: