Only one ipfailover container can be run on a node #12472

pecameron · 2017-01-12T22:05:57Z

The ipfailover pods for a given configuration must run on
different nodes. We are using the ServicePort as a mechanism
to prevent multiple pods for same configuration from starting
on the same node. Since pods for different configurations can
run on the same node a different ServicePort is used for each
configuration.

In the future, this may be changed to pod anti-affinity.

bug 1411501
https://bugzilla.redhat.com/show_bug.cgi?id=1411501

Signed-off-by: Phil Cameron [email protected]

pecameron · 2017-01-12T22:06:34Z

@knobunc PTAL

knobunc

Good start, but we need the pod anti-affinity stuff at https://kubernetes.io/docs/user-guide/node-selection/ so that the same spec doesn't land on the same node twice.

Before the port was a cheap way to tell the scheduler to schedule only one (global) instance of the container on a single node... but that is overly limiting. We need to say "only one ipfailover instance of this configuration can land on a single node".

Please change the title too... it sounds like we are adding the restriction.

knobunc · 2017-01-13T13:26:01Z

@openshift/networking PTAL

knobunc · 2017-01-13T13:27:46Z

[test]

pecameron · 2017-01-17T19:18:26Z

@knobunc I looked at the anti-affinity and I think this happens by default. It will try to start replicas on different nodes. I likely read that wrong. Do you know what really happens?

knobunc · 2017-01-17T19:23:39Z

@pecameron: If you have two nodes and scale up to three instances, what happens? As I read the anti-affinity docs it will forbid the third from scaling up.

ramr · 2017-01-23T22:06:39Z

Is there another PR that has the anti-affinity changes? Because if we just remove the port here, kubernetes could schedule 'n' pods on the same node. The easiest way is to mark all your nodes in the cluster except for a single one non-schedulable and make sure that 2 or more pods don't land up on that only schedulable node.
You will need to also generate/add a scheduler.alpha.kubernetes.io/affinity annotation.

knobunc · 2017-01-23T22:42:13Z

Ram: We have several needs that seem to be in conflict with one another. In one case we need to be able to schedule multiple failover pods so that we can fail over. But in the other, we need to ensure that only one of each type of configuration land on each node. But we need to allow different configurations to cooexist on the same node.

So allocating a port per configuration seems to be a reasonable compromise until we get on a release with a stable anti-affinity API.

ramr · 2017-01-23T23:19:18Z

@knobunc yeah just saw the message from davidopp re: node affinity/anti-affinity apis changing in 1.6 - so yeah ports might still be the best solution for the short-term. I think though to allow for multiple ipfailover installations we can change this a little bit - maybe use the port number (or 63000 if we don't want to use up port numbers near 1985) + the vrrp-id-offset to do the equivalent for pod anti-affinity for an ipfailover deployment (reserve so a single instance of the same ipfailover deployment runs on that selected node).

pecameron · 2017-01-24T13:35:04Z

@knobunc @ramr see the bug for more comments. I can go with port 63000. I will go ahead and make the changes for this approach.

knobunc · 2017-01-24T14:43:50Z

pecameron · 2017-01-24T20:20:30Z

@knobunc changed to use port 63000 + vrrp-id, added tests, changed commit title and comment. PTAL

knobunc · 2017-01-24T20:29:55Z

pkg/cmd/experimental/ipfailover/ipfailover.go

-	if len(options.ServiceAccount) == 0 {
-		return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
+	if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 {
+		return fmt.Errorf("The vrrp-id-offset must be in the range 0..255")


0 to 254, right?

No. The real range is 1..255, The code changes 0 to 1 so the valid input range is 0..255

I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).

Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.

I still see 255 here

knobunc · 2017-01-24T20:30:15Z

pkg/cmd/experimental/ipfailover/ipfailover.go

+	// same configuration starting on the same node. Since pods for different
+	// configurations can run on the same node a different ServicePort is used
+	// for each configuration.
+	// In the future, this may be changed to pod-affinity.


pod anti-affinity

knobunc · 2017-01-24T20:30:30Z

pkg/ipfailover/keepalived/plugin.go

@@ -116,7 +116,7 @@ func (p *KeepalivedPlugin) Generate() (*kapi.List, error) {
 	}

 	if len(p.Options.VirtualIPs) == 0 {
-		return nil, fmt.Errorf("you must specify at least one virtual IP address for keepalived to expose")
+		return nil, fmt.Errorf("you must specify at least one virtual IP address (--virtual-ips=)for keepalived to expose")


space after )

knobunc · 2017-01-24T20:30:39Z

pkg/ipfailover/types.go

@@ -12,8 +12,9 @@ const (
 	// DefaultType is the default IP Failover type.
 	DefaultType = "keepalived"

-	// DefaultServicePort is the default service port.
-	DefaultServicePort = 1985
+	// DefaultServicePort is the port assiciated with the ipfailover config.


knobunc · 2017-01-24T20:31:00Z

pkg/ipfailover/types.go

-	// DefaultServicePort is the default service port.
-	DefaultServicePort = 1985
+	// DefaultServicePort is the port assiciated with the ipfailover config.
+	// each ipfailover config has a different ServicePort.


knobunc · 2017-01-24T20:33:12Z

test/cmd/router.sh

@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
+os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
+os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=356' 'error: The vrrp-id-offset must be in the range 0..255'


This range is wrong.

Let's add a test for 255 to test for failure.

The range is correct. The default is 0. The actual input range is 0 to 255. When 0 the code sets it to 1.
You want to change the 356 to 256? OK

pecameron · 2017-01-24T21:01:43Z

@knobunc changes made, PTAL

knobunc · 2017-01-24T21:19:56Z

pkg/cmd/experimental/ipfailover/ipfailover.go

-	if len(options.ServiceAccount) == 0 {
-		return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
+	if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 {
+		return fmt.Errorf("The vrrp-id-offset must be in the range 0..255")


I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).

Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.

pecameron · 2017-01-24T21:36:16Z

@knobunc the script is not all that straight forward. counter starts at 1 which goes to vrrp_instance (iid) as 1. line 203: virtual_router_id $((vrrpidoffset + iid))
expects the offset to start at 0 which makes the first vrrp-id 1. So the range does need to go to 254.
Changed.

pecameron · 2017-01-25T16:32:02Z

@knobunc 254 range and "1" default are in.

knobunc · 2017-01-25T18:18:01Z

test/cmd/router.sh

@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
+os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
+os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..255'


Please make this offset = 255 and change the comment

ramr

A few questions on edge values but otherwise LGTM

ramr · 2017-01-25T18:50:44Z

pkg/cmd/experimental/ipfailover/ipfailover.go

-	if len(options.ServiceAccount) == 0 {
-		return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
+	if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 254 {
+		return fmt.Errorf("The vrrp-id-offset must be in the range 0..254")


Hmm, if we set iid to 0 above then it can be an offset from 0 to 255 - this way we lose 1 vrrp id right?

Ref: keepalived docs: https://github.com/acassen/keepalived/blob/master/doc/man/man5/keepalived.conf.5#L433

@ramr The vrrp protocol spec limits the valid vid values to 1..255, 0 is specifically invalid so there are 254 possible valid values. The doc reference is wrong.
For some reason this was coded so the input vrrp-id-offset is 0-254, this is converted to 1..255 internally.

ok cool then this looks lgtm

ramr · 2017-01-25T18:51:50Z

images/ipfailover/keepalived/lib/config-generators.sh

@@ -178,7 +178,7 @@ function generate_vip_section() {
 #
 function generate_vrrpd_instance_config() {
  local servicename=$1
-  local iid=${2:-"0"}
+  local iid=${2:-"1"}


Any reason to set it to 1 instead - that changes the value of virtual router id below to always start from 1 -
virtual_router_id $((vrrpidoffset + iid))

ramr

LGTM

pecameron · 2017-01-25T19:27:41Z

@ramr is there anything more to discuss before dropping the changes requested flag?

pecameron · 2017-01-25T19:28:32Z

@knobunc anything left before dropping the changes requested flag?

ramr · 2017-01-25T19:37:19Z

@pecameron approved it about half an hour back. Maybe you need to refresh the page. thx

knobunc · 2017-01-25T19:40:42Z

test/cmd/router.sh

@@ -78,7 +78,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
 os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
+os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
+os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..254'


--vrrp-id-offset=256

I want that to test the 255 case.

The ipfailover pods for a given configuration must run on different nodes. We are using the ServicePort as a mechanism to prevent multiple pods for same configuration from starting on the same node. Since pods for different configurations can run on the same node a different ServicePort is used for each configuration. In the future, this may be changed to pod anti-affinity. bug 1411501 https://bugzilla.redhat.com/show_bug.cgi?id=1411501 Signed-off-by: Phil Cameron <[email protected]>

pecameron · 2017-01-26T14:29:24Z

[TEST]

knobunc · 2017-01-26T14:53:35Z

@pecameron find the flake and link to it, and I'll re-run the tests.

pecameron · 2017-01-26T16:02:32Z

@knobunc your las change request is in the image. Anything else needed?

created issue for test failure:
test/end-to-end/core.sh:363: executing 'cat '/tmp/openshift/test-end-to-end-docker//logs/cli-with-token.log'' expecting success and text 'Using in-cluster configuration': the output content test failed
#12678 opened 11 minutes ago by pecameron

pecameron · 2017-01-26T20:56:13Z

@knobunc @ncdc This seems to be (could be?) the problem. What do I need to change??

Running test/end-to-end/core.sh:422: executing 'oc get builds --namespace test -o jsonpath='{.items[0].status.phase}'' expecting any result and text 'Running'; re-trying every 0.2s until completion or 600.000s...
FAILURE after 599.388s: test/end-to-end/core.sh:422: executing 'oc get builds --namespace test -o jsonpath='{.items[0].status.phase}'' expecting any result and text 'Running'; re-trying every 0.2s until completion or 600.000s: the command timed out
Standard output from the command:
Pending
... repeated 33 times
Failed
... repeated 1434 times
Standard error from the command:
[ERROR] PID 32716: hack/lib/cmd.sh:618: return "${return_code}" exited with status 1.
[INFO] Stack Trace:
[INFO] 1: hack/lib/cmd.sh:618: return "${return_code}"
[INFO] 2: test/end-to-end/core.sh:422: os::cmd::try_until_text
[INFO] Exiting with code 1.
[ERROR] PID 31461: hack/test-end-to-end-docker.sh:91: ${OS_ROOT}/test/end-to-end/core.sh exited with status 1.
[INFO] Stack Trace:
[INFO] 1: hack/test-end-to-end-docker.sh:91: ${OS_ROOT}/test/end-to-end/core.sh

ncdc · 2017-01-26T21:20:56Z

It appears to be a flake. It's happened to several PRs. I'll open a new issue and rekick the test. Just a minute.

ncdc · 2017-01-26T21:35:06Z

Actually, my grep was wrong. It's only happened to this PR in the last week. We can try to test again and see how it does. If it fails the same way, we'll create a flake issue. [test]

openshift-bot · 2017-01-26T21:37:52Z

Evaluated for origin test up to 959a4bf

openshift-bot · 2017-01-26T21:38:03Z

continuous-integration/openshift-jenkins/test Running (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13360/) (Base Commit: a4b2294)

openshift-bot · 2017-01-26T23:02:31Z

continuous-integration/openshift-jenkins/test SUCCESS

pecameron · 2017-01-27T13:29:00Z

@knobunc is this ready to merge?

knobunc · 2017-01-27T14:41:06Z

[merge]

openshift-bot · 2017-01-27T14:45:25Z

Evaluated for origin merge up to 959a4bf

openshift-bot · 2017-01-27T14:45:32Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13366/) (Base Commit: 6ee94d0) (Image: devenv-rhel7_5794)

knobunc requested changes Jan 13, 2017

View reviewed changes

knobunc requested a review from ramr January 13, 2017 13:25

pecameron force-pushed the bz1411501 branch 2 times, most recently from 6c84200 to e67a33e Compare January 24, 2017 20:15

knobunc requested changes Jan 24, 2017

View reviewed changes

pecameron force-pushed the bz1411501 branch from e67a33e to 4319ef2 Compare January 24, 2017 21:00

knobunc requested changes Jan 24, 2017

View reviewed changes

pecameron force-pushed the bz1411501 branch 2 times, most recently from 8923218 to 034e447 Compare January 25, 2017 16:27

knobunc requested changes Jan 25, 2017

View reviewed changes

pecameron force-pushed the bz1411501 branch from 034e447 to 9789eb7 Compare January 25, 2017 18:36

ramr suggested changes Jan 25, 2017

View reviewed changes

ramr approved these changes Jan 25, 2017

View reviewed changes

knobunc requested changes Jan 25, 2017

View reviewed changes

pecameron force-pushed the bz1411501 branch from 9789eb7 to 959a4bf Compare January 25, 2017 20:25

knobunc approved these changes Jan 26, 2017

View reviewed changes

ncdc mentioned this pull request Jan 26, 2017

test/end-to-end/core.sh:363: executing 'cat '/tmp/openshift/test-end-to-end-docker//logs/cli-with-token.log'' expecting success and text 'Using in-cluster configuration': the output content test failed #12678

Closed

openshift-bot merged commit 83e2ef6 into openshift:master Jan 27, 2017

pecameron deleted the bz1411501 branch January 31, 2017 18:41

Only one ipfailover container can be run on a node #12472

Only one ipfailover container can be run on a node #12472

Conversation

pecameron commented Jan 12, 2017 • edited Loading

pecameron commented Jan 12, 2017

knobunc left a comment • edited Loading

Choose a reason for hiding this comment

knobunc commented Jan 13, 2017

knobunc commented Jan 13, 2017

pecameron commented Jan 17, 2017

knobunc commented Jan 17, 2017

ramr commented Jan 23, 2017

knobunc commented Jan 23, 2017

ramr commented Jan 23, 2017

pecameron commented Jan 24, 2017

knobunc commented Jan 24, 2017

pecameron commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pecameron commented Jan 24, 2017

Choose a reason for hiding this comment

pecameron commented Jan 24, 2017

pecameron commented Jan 25, 2017

Choose a reason for hiding this comment

ramr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramr left a comment

Choose a reason for hiding this comment

pecameron commented Jan 25, 2017

pecameron commented Jan 25, 2017

ramr commented Jan 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pecameron commented Jan 26, 2017

knobunc commented Jan 26, 2017

pecameron commented Jan 26, 2017

pecameron commented Jan 26, 2017

ncdc commented Jan 26, 2017

ncdc commented Jan 26, 2017

openshift-bot commented Jan 26, 2017

openshift-bot commented Jan 26, 2017

openshift-bot commented Jan 26, 2017

pecameron commented Jan 27, 2017

knobunc commented Jan 27, 2017

openshift-bot commented Jan 27, 2017

openshift-bot commented Jan 27, 2017 • edited Loading

pecameron commented Jan 12, 2017 •

edited

Loading

knobunc left a comment •

edited

Loading

openshift-bot commented Jan 27, 2017 •

edited

Loading