Node IP flip flops #13645

rajatchopra · 2017-04-05T19:27:20Z

when both master and node reboot: bz1438402

[test]

rajatchopra · 2017-04-05T21:55:14Z

@openshift/networking @knobunc Please review.

Fixes this bug, where a node is rebooted alongwith the master, and node reports with a flipped status address. It causes the hostsubnet to be re-assigned.
The fix looks at all existing valid addresses and if the existing hostsubnet has a nodeIP that is among the valid ones, then no update is performed.

pravisankar · 2017-04-06T00:12:06Z

Looking at setNodeAddress() in kubelet_node_status.go, in case of cloud provider multiple addresses are stored in node status but in non-cloud provider case, only one address and hostname address are stored in node status. The chosen node address is dependent on host-name/node-name/ChooseHostInterface(). So we could end up in a situation where nodeIP stored in HostSubnet is not in the valid addresses we got from node status.

We do support desired nodeIP to use in openshift config file. So the bug fix will be to delete all the HostSubnets for troubled nodes, specify desired nodeIP to use in openshift config and restart openshift-node service?
Once we finish this card https://trello.com/c/sCNKKYCz/375-5-allow-segregating-cluster-traffic-from-management-traffic-sdn-functionality , user can also choose desired network interface and/or node IP.

pecameron · 2017-04-06T13:19:27Z

@rajatchopra A few thoughts:
Your discussion in the comment should be in the description above.
The discussion should also be in comments in the code.
This seems to rely on current contents of map, this dependency needs to be called out in a comment.
I am running a cluster in the lab where em3 is the lab network and em1 is the cluster internal network for both master to node and pod to pod. Does this fix work in that environment? Can I put the master to node on em4 and leave the pod to pod on em2?
Needs a test.

pravisankar · 2017-04-07T00:31:55Z

@pecameron
Your requirement is what I'm working on right now, trello card: https://trello.com/c/sCNKKYCz/375-5-allow-segregating-cluster-traffic-from-management-traffic-sdn-functionality (master to node communication on one network interface and pod to pod on other interface). This PR tried to use same IP for the node during openshift restarts when the system has multiple interfaces/IPs on the node. I was arguing this may not fix the issue and can use nodeIP openshift config for this use case.

knobunc

The code looks good, but can you make the commit heading a little better so when we git log it's clear. And some detail in the commit message wouldn't hurt.

Thanks!

rajatchopra · 2017-04-07T04:37:19Z

We do support desired nodeIP to use in openshift config file. So the bug fix will be to delete all the HostSubnets for troubled nodes, specify desired nodeIP to use in openshift config and restart openshift-node service?
Once we finish this card https://trello.com/c/sCNKKYCz/375-5-allow-segregating-cluster-traffic-from-management-traffic-sdn-functionality , user can also choose desired network interface and/or node IP.

@pravisankar You are correct, but we need some way to allow existing clusters to not malfunction, and this is the easiest patch I could find short of the migration script. When the Trello card is implemented we may not need this patch, but we need a fix for the release before that feature is in.

@pecameron The Trello card is the comprehensive way of dealing with this problem as Ravi pointed out. This PR is to provide a patch fix until that card is done.

* where when a node is rebooted alongwith the master, and node reports with a flipped status address. It causes the hostsubnet to be re-assigned. * the fix looks at all existing valid addresses and if the existing hostsubnet has a nodeIP that is among the valid ones, then no update is performed.

pravisankar

OpenShift nodeIP config is honored both in cloud and non-cloud provider cases for 3.5 or later releases but before 3.5 release, it was only honoring for non-cloud provider case.

Any cluster running with 3.5 or later release doesn't need this fix and can use nodeIP config but this code will help as a safe guard. Clusters running older releases will benefit from this fix. Back porting this fix will be more useful.
LGTM
cc: @rajatchopra @dcbw

danwinship · 2017-04-10T14:10:29Z

pkg/sdn/plugin/subnets.go

+// addNode takes the nodeName, a preferred nodeIP, the node's annotations and other valid ip addresses
+// Creates or updates a HostSubnet if needed
+// Returns the IP address used for hostsubnet (either the preferred or one from the otherValidAddresses) and any error
+func (master *OsdnMaster) addNode(nodeName string, nodeIP string, hsAnnotations map[string]string, otherValidAddresses []kapi.NodeAddress) (string, error) {


should probably just make this take the whole Node object at this point.

and maybe return the HostSubnet rather than just the IP?

The F5 ghost node thing prevented me from doing that.. F5 exists as a hostsubnet, but there is no Node object for it.

knobunc · 2017-04-10T18:50:20Z

re-[test] last timed out waiting for copr.

openshift-bot · 2017-04-10T18:53:29Z

Evaluated for origin test up to dacd766

dcbw

lgtm

openshift-bot · 2017-04-10T20:11:17Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/679/) (Base Commit: 1d26c38)

danwinship · 2017-04-10T20:25:24Z

[merge]

openshift-bot · 2017-04-10T20:29:30Z

Evaluated for origin merge up to dacd766

openshift-bot · 2017-04-10T20:53:47Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/306/) (Base Commit: e3dff57) (Image: devenv-rhel7_6134)

knobunc approved these changes Apr 7, 2017

View reviewed changes

rajatchopra force-pushed the bz1438402 branch from 6eda912 to dacd766 Compare April 7, 2017 16:56

pravisankar approved these changes Apr 7, 2017

View reviewed changes

danwinship reviewed Apr 10, 2017

View reviewed changes

dcbw approved these changes Apr 10, 2017

View reviewed changes

openshift-bot merged commit 445ae00 into openshift:master Apr 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node IP flip flops #13645

Node IP flip flops #13645

rajatchopra commented Apr 5, 2017

rajatchopra commented Apr 5, 2017

pravisankar commented Apr 6, 2017

pecameron commented Apr 6, 2017

pravisankar commented Apr 7, 2017

knobunc left a comment

rajatchopra commented Apr 7, 2017 •

edited

Loading

pravisankar left a comment

danwinship Apr 10, 2017

rajatchopra Apr 10, 2017

knobunc commented Apr 10, 2017

openshift-bot commented Apr 10, 2017

dcbw left a comment

openshift-bot commented Apr 10, 2017

danwinship commented Apr 10, 2017

openshift-bot commented Apr 10, 2017

openshift-bot commented Apr 10, 2017 •

edited

Loading

Node IP flip flops #13645

Node IP flip flops #13645

Conversation

rajatchopra commented Apr 5, 2017

rajatchopra commented Apr 5, 2017

pravisankar commented Apr 6, 2017

pecameron commented Apr 6, 2017

pravisankar commented Apr 7, 2017

knobunc left a comment

Choose a reason for hiding this comment

rajatchopra commented Apr 7, 2017 • edited Loading

pravisankar left a comment

Choose a reason for hiding this comment

danwinship Apr 10, 2017

Choose a reason for hiding this comment

rajatchopra Apr 10, 2017

Choose a reason for hiding this comment

knobunc commented Apr 10, 2017

openshift-bot commented Apr 10, 2017

dcbw left a comment

Choose a reason for hiding this comment

openshift-bot commented Apr 10, 2017

danwinship commented Apr 10, 2017

openshift-bot commented Apr 10, 2017

openshift-bot commented Apr 10, 2017 • edited Loading

rajatchopra commented Apr 7, 2017 •

edited

Loading

openshift-bot commented Apr 10, 2017 •

edited

Loading