Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f5 vxlan integration with sdn #11181

Merged
merged 1 commit into from
Oct 24, 2016
Merged

Conversation

rajatchopra
Copy link
Contributor

@rajatchopra rajatchopra commented Oct 1, 2016

This PR implements the VxLAN integration of F5 with a vxlan based overlay. Three pieces of work primarily:

  1. Router plugin now watches for Nodes in the cluster (to program the F5's vxlan FDB; in the case of template router its a noop)
  2. F5 router programs the VxLAN devices/routing on the F5 BIG IP on initialization
  3. Command line changes to accept the special parameters needed (including a boolean that disables the setup)

@rajatchopra
Copy link
Contributor Author

[test]

1 similar comment
@knobunc
Copy link
Contributor

knobunc commented Oct 3, 2016

[test]

@rajatchopra rajatchopra force-pushed the f5vxlan branch 2 times, most recently from f93a55e to 3ad39a5 Compare October 5, 2016 18:45
@chen23
Copy link

chen23 commented Oct 6, 2016

FYI in the policy section of the code the use of the Legacy flag will only work in TMOS version 12.1 or newer. Ideally have a version check to only add the flag if TMOS >= 12.1

policyPayload := f5Policy{
        Name:     policyName,
        Controls: []string{"forwarding"},
        Requires: []string{"http"},
        Strategy: "best-match",
        Legacy:   true,
}

Python example of version check here: https://github.com/f5devcentral/f5-icontrol-codeshare-python/blob/master/kubernetes-example/k8s2f5.py#L328

@chen23
Copy link

chen23 commented Oct 6, 2016

The auth code looks to only support basicauth. This will limit authentication to local accounts and not any remote (radius/tacacs), etc... https://github.com/rajatchopra/origin/blob/3ad39a5c60f344d8fa6cb673f021effe9e139c4b/pkg/router/f5/f5.go#L413

    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("Accept", "application/json")
    req.SetBasicAuth(f5.username, f5.password)

client := &http.Client{Transport: tr}

Python example: https://github.com/F5Networks/f5-icontrol-rest-python/blob/5c43b0f6424a5f495c50fd2a318e172d5c3d7e37/icontrol/authtoken.py
DevCentral article: https://devcentral.f5.com/articles/demystifying-icontrol-rest-part-6-token-based-authentication

@chen23
Copy link

chen23 commented Oct 6, 2016

Doing a quick scan of the code I'm not seeing an explicit save of the running config after it is generated. F5Networks/f5-common-python#342

iControl REST sits in front of tmsh

https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/bigip-system-essentials-11-6-0/2.html

"When you use the Traffic Management Shell (tmsh) to configure the system, you must explicitly issue a save command to store the configuration data that you have generated. Otherwise, the newly-generated configuration data is not actually stored on the system. For more information about tmsh, see the Traffic Management Shell (tmsh) Reference Guide."

@kmunson1973
Copy link

You can save the running config by issuing a POST /mgmt/tm/sys/config with the following json payload.
{
"command":"save"
}

func checkIPAndGetMac(ipStr string) (string, error) {
ip := net.ParseIP(ipStr)
if ip == nil {
errStr := fmt.Sprintf("'%s' is not a valid IP address for a vtep", ipStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the vtep has other restrictions. Perhaps:
'vtep IP '%s' is not a valid IP address'

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code structure LGTM... I have no idea if what it is doing with the F5 is right though :-)

Are there any tests we can add? Perhaps at least to exercise the vtep MAC generation function?

glog.V(4).Infof("F5 initialization is complete.")

return nil
}

func checkIPAndGetMac(ipStr string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have vtep in it? Or is the package scope enough to indicate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just a util function. The package scope is enough I guess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing this function name to be more descriptive (e.g. convertIpToVtepMac).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider adding a unit test.

@rajatchopra rajatchopra force-pushed the f5vxlan branch 2 times, most recently from 52b068c to 0ac78a7 Compare October 19, 2016 16:13
@knobunc
Copy link
Contributor

knobunc commented Oct 19, 2016

Repo problems re-[test]

@knobunc
Copy link
Contributor

knobunc commented Oct 19, 2016

@smarterclayton PTAL. The tests are still being worked on, but can you please look over the code to see if it is good?

@rajatchopra rajatchopra changed the title WIP[do not merge]: f5 vxlan integration with sdn f5 vxlan integration with sdn Oct 19, 2016
Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much in the way of test changes that would target the new functionality. Is that an oversight or is qa intended to validate this feature post-merge?

@@ -87,10 +91,18 @@ that you must have a cluster\-wide administrative role to view all namespaces.
The path to the F5 BIG\-IP SSH private key file

.PP
\fB\-\-f5\-setup\-osdn\-vxlan\fP=false
Flag to enable VxLAN setup for integration openshif\-sdn with the F5 BIG\-IP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshif -> openshift

nodeEventQueue := oscache.NewEventQueue(cache.MetaNamespaceKeyFunc)
cache.NewReflector(&nodeLW{
client: factory.NClient,
field: factory.Fields,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes sense to apply the same filtering to nodes as is being applied to routes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno - would it work if you only have a subset of nodes added to the F5 config and the endpoints for a service are on nodes that are not part of the set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no harm in keeping the fields/filtering. Possible use case with FDB being populated for only a portion of the cluster (say a datacenter).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with supporting filtering on the nodes. I was asking whether it made sense for the same filtering parameters to be applied to both nodes and routes. The current approach of reusing the same fields and labels to filter both would preclude filtering one but not the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense to just apply field labeling to nodes - we should never apply the same field selector to two types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be watching nodes unless the underlying router is using it. This loop should be optional and the factory must explicitly choose to use it.

@@ -256,3 +275,23 @@ func (lw *endpointsLW) Watch(options kapi.ListOptions) (watch.Interface, error)
}
return lw.client.Endpoints(lw.namespace).Watch(opts)
}

// nodeLW is a list watcher for routes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routes -> nodes

UsePmtu: "enabled",
}
err = f5.post(url, tunnelPayload, nil)
if err != nil && err.(F5Error).httpStatusCode != 409 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defining 409 as a constant whose name is self-documenting (e.g. STATUS_ALREADY_EXISTS) and avoid having to comment as to why 409 is special.

}
// create the net self IP
err = f5.post(selfUrl, netSelfPayload, nil)
if err != nil && err.(F5Error).httpStatusCode != 409 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re: named constant

glog.V(4).Infof("F5 initialization is complete.")

return nil
}

func checkIPAndGetMac(ipStr string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider changing this function name to be more descriptive (e.g. convertIpToVtepMac).

glog.Warningf("Error in obtaining IP address of newly added node %s - %v", node.Name, err)
return nil
}
p.F5Client.AddVtep(ip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call be returning the error that is potentially returned by AddVtep()?

glog.Warningf("Error in obtaining IP address of deleted node %s - %v", node.Name, err)
return nil
}
p.F5Client.RemoveVtep(ip)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call be returning the error that is potentially returned by RemoveVtep?

}
p.F5Client.RemoveVtep(ip)
case watch.Modified:
// ignore the modified event. Change in IP address of the node is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the implication of not supporting node address changes? Is it just going to be documented? Should there be logging to diagnose issues if/when address changes occur?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question re: changes and asked Rajat on this. He said on the openshift-sdn side we don't support IP address changes for a node.

glog.V(4).Infof("F5 initialization is complete.")

return nil
}

func checkIPAndGetMac(ipStr string) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider adding a unit test.

Copy link
Contributor

@ramr ramr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a question on the SetupOSDNVxLAN parameter if we can define it implicitly based on the internal ip and cidr, some comments re: constants and todos.
And a note on HandleNode for the host_admitter code in PR #11201

@@ -88,6 +103,9 @@ func (o *F5Router) Bind(flag *pflag.FlagSet) {
flag.StringVar(&o.PrivateKey, "f5-private-key", util.Env("ROUTER_EXTERNAL_HOST_PRIVKEY", ""), "The path to the F5 BIG-IP SSH private key file")
flag.BoolVar(&o.Insecure, "f5-insecure", util.Env("ROUTER_EXTERNAL_HOST_INSECURE", "") == "true", "Skip strict certificate verification")
flag.StringVar(&o.PartitionPath, "f5-partition-path", util.Env("ROUTER_EXTERNAL_HOST_PARTITION_PATH", f5plugin.F5DefaultPartitionPath), "The F5 BIG-IP partition path to use")
flag.StringVar(&o.InternalAddress, "f5-internal-address", util.Env("ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS", ""), "The F5 BIG-IP internal interface's IP address")
flag.StringVar(&o.VxlanGateway, "f5-vxlan-gateway-cidr", util.Env("ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR", ""), "The F5 BIG-IP gateway-ip-address/cidr-mask for setting up the VxLAN")
flag.BoolVar(&o.SetupOSDNVxLAN, "f5-setup-osdn-vxlan", false, "Flag to enable VxLAN setup for integration openshif-sdn with the F5 BIG-IP")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this needs to be set with a command line arg option? It maybe better to implicitly set it if f5-internal-address and f5-vxlan-gateway-cidr is set - basically one without the other(s) doesn't make sense does it?

@@ -108,6 +126,10 @@ func (o *F5Router) Validate() error {
return errors.New("F5 HTTP and HTTPS vservers cannot both be blank")
}

if o.SetupOSDNVxLAN && (len(o.VxlanGateway) == 0 || len(o.InternalAddress) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment re: o.SetupOSDNVxLAN.

@@ -51,6 +54,8 @@ func (c *RouterController) Run() {
}
go utilwait.Forever(c.HandleRoute, 0)
go utilwait.Forever(c.HandleEndpoints, 0)
c.HandleNode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? The one from the goroutine ahead should be called soon anyway.

return &RouterControllerFactory{
KClient: kc,
OSClient: oc,
NClient: kc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if its same as kc - we should cleanup to just use 1 ?Client. Maybe cleanup in follow on work. I know changing it would touch a lot more code in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment/todo here or in the type definition would be useful.

func (p *UniqueHost) HandleNode(eventType watch.EventType, node *kapi.Node) error {
return p.plugin.HandleNode(eventType, node)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, depending on which PR merges first - #11201 - will need to add HandleNode to the host_admitter plugin.

}
p.F5Client.RemoveVtep(ip)
case watch.Modified:
// ignore the modified event. Change in IP address of the node is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question re: changes and asked Rajat on this. He said on the openshift-sdn side we don't support IP address changes for a node.

@@ -120,6 +123,8 @@ type f5Policy struct {
// connections when more than one rule matches. Typically we use best-match;
// other possible values are all-match and first-match.
Strategy string `json:"strategy"`

Legacy bool `json:"legacy"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious on this one - thanks for explaining it. So if we ever decide to lock down to a specific version of the F5 api, then we would have to remove this - maybe adding a comment here would be helpful. Thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc

@@ -106,6 +106,9 @@ type f5Policy struct {
// Name is the name of the policy.
Name string `json:"name"`

// Partition name for the policy
TmPartition string `json:"tmPartition"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt a little weird that this one is called TmPartition but every other F5 api call payload uses Partition. Thanks for explaining that one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the godoc explaining what TmPartition means or be more explicit

LocalAddress: f5.internalAddress,
Mode: "bidirectional",
Mtu: "0",
Profile: path.Join(f5.partitionPath, "vxlan-ose"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constant for vxlan-ose.

nodeEventQueue := oscache.NewEventQueue(cache.MetaNamespaceKeyFunc)
cache.NewReflector(&nodeLW{
client: factory.NClient,
field: factory.Fields,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno - would it work if you only have a subset of nodes added to the F5 config and the endpoints for a service are on nodes that are not part of the set?

@ramr
Copy link
Contributor

ramr commented Oct 21, 2016

LGTM

@marun
Copy link
Contributor

marun commented Oct 21, 2016

networking job flaked with #11452 re-[test]

@@ -51,6 +52,7 @@ func (c *RouterController) Run() {
}
go utilwait.Forever(c.HandleRoute, 0)
go utilwait.Forever(c.HandleEndpoints, 0)
go utilwait.Forever(c.HandleNode, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this always on if it's only used in F5?

@@ -27,6 +27,7 @@ import (
type RouterControllerFactory struct {
KClient kclient.EndpointsNamespacer
OSClient osclient.RoutesNamespacer
NClient kclient.NodesInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NamespaceClient, abbreviations were a special case but shouldn't be propogared.

@@ -1776,6 +1776,14 @@ items:
- ""
attributeRestrictions: null
resources:
- nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big security expansion - @deads2k

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton didn't you want this role bindable by project-admins? This will prevent that from being possible.

As for our system router, it doesn't worry overly much, but for per-project routers, this makes it impossible to create those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its also required. If we expect the F5 to participate in the SDN it needs to know about the nodes its talking. We don't have OpenShift-sdn doing that. If this is a big problem it would mean splitting the f5 into its own role, I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its also required. If we expect the F5 to participate in the SDN it needs to know about the nodes its talking. We don't have OpenShift-sdn doing that. If this is a big problem it would mean splitting the f5 into its own role, I guess?

Or adding it as the delta required.

@smarterclayton
Copy link
Contributor

Removing from merge queue until my comments are addressed.

@rajatchopra rajatchopra force-pushed the f5vxlan branch 2 times, most recently from 41d5983 to 1d695d3 Compare October 23, 2016 17:32
@rajatchopra
Copy link
Contributor Author

@smarterclayton I have addressed all your feedback. Primarily two:

  1. Node watch loop does not run unless its F5 && a vxlan setup is needed
  2. All fields/labels are put in for node watch (we have no use case yet where only a portion of the sdn needs to be addressed by F5)

Marking it [merge]. Please feel free to revert the commit if anything is loose.

TODO:
Document on how to use the native VxLAN feature

@eparis
Copy link
Member

eparis commented Oct 23, 2016

[merge] you stoopid flake. Merge!

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 23, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10513/) (Image: devenv-rhel7_5234)

@eparis
Copy link
Member

eparis commented Oct 23, 2016

I'm an idiot, it wasn't a flake!

test/integration/router_stress_test.go:285: not enough arguments in call to factory.Create

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 96354f7

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 96354f7

@eparis
Copy link
Member

eparis commented Oct 24, 2016

back in at #5. wait and see!

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10513/) (Base Commit: 35b539d)

@openshift-bot openshift-bot merged commit 0787d9f into openshift:master Oct 24, 2016
@smarterclayton
Copy link
Contributor

As David noted this causes a security vulnerability because it's escalating the privileges granted to system:router. I'm going to push a change that removes the node permission granted here and to fix the issue you can open another one (instead of reverting here).

@smarterclayton
Copy link
Contributor

Opened a PR that removes the permission, we can iterate on the next solution at a lower pace.

@smarterclayton
Copy link
Contributor

smarterclayton commented Oct 25, 2016 via email

@smarterclayton
Copy link
Contributor

We could also create system:local-router or system:personal-router.

On Mon, Oct 24, 2016 at 9:14 AM, David Eads [email protected]
wrote:

@deads2k commented on this pull request.

In test/testdata/bootstrappolicy/bootstrap_cluster_roles.yaml
#11181:

@@ -1776,6 +1776,14 @@ items:
- ""
attributeRestrictions: null
resources:

    • nodes

Its also required. If we expect the F5 to participate in the SDN it needs
to know about the nodes its talking. We don't have OpenShift-sdn doing
that. If this is a big problem it would mean splitting the f5 into its own
role, I guess?

Or adding it as the delta required.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11181, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5OJReoyNdhUMz449EXSxXww4BvEks5q3K84gaJpZM4KLn0M
.

@eparis
Copy link
Member

eparis commented Oct 25, 2016

You mean why I said I'm an idiot, it wasn't a flake!

@smarterclayton
Copy link
Contributor

Integration tests aren't compiling

On Oct 23, 2016, at 4:45 PM, OpenShift Bot [email protected] wrote:

continuous-integration/openshift-jenkins/merge FAILURE (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10508/) (Base
Commit: 35b539d
35b539d
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11181 (comment),
or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABG_p2r1fYHWcAjiUQ-TekgW_5ppbsHqks5q28dfgaJpZM4KLn0M
.

@smarterclayton
Copy link
Contributor

The least disruptive thing is to create a new role with the additional
permission and require admins to set it up. We can't retroactively add it
to old clusters (reconcile) if anyone has used it, and it's a very big and
surprising role.

So create system:router-.... (david i need a name suggestion) that contains
the node role.

On Mon, Oct 24, 2016 at 10:11 AM, Clayton Coleman [email protected]
wrote:

We could also create system:local-router or system:personal-router.

On Mon, Oct 24, 2016 at 9:14 AM, David Eads [email protected]
wrote:

@deads2k commented on this pull request.

In test/testdata/bootstrappolicy/bootstrap_cluster_roles.yaml
#11181:

@@ -1776,6 +1776,14 @@ items:
- ""
attributeRestrictions: null
resources:

    • nodes

Its also required. If we expect the F5 to participate in the SDN it needs
to know about the nodes its talking. We don't have OpenShift-sdn doing
that. If this is a big problem it would mean splitting the f5 into its own
role, I guess?

Or adding it as the delta required.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11181, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p5OJReoyNdhUMz449EXSxXww4BvEks5q3K84gaJpZM4KLn0M
.

@smarterclayton
Copy link
Contributor

GitHub is delivering emails I sent on Friday. Thanks github.

@deads2k
Copy link
Contributor

deads2k commented Oct 25, 2016

system:rouer-privileged, system:rouer-extensions?

@rajatchopra rajatchopra deleted the f5vxlan branch November 4, 2016 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.