Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only one ipfailover container can be run on a node #12472

Merged
merged 1 commit into from
Jan 27, 2017

Conversation

pecameron
Copy link
Contributor

@pecameron pecameron commented Jan 12, 2017

The ipfailover pods for a given configuration must run on
different nodes. We are using the ServicePort as a mechanism
to prevent multiple pods for same configuration from starting
on the same node. Since pods for different configurations can
run on the same node a different ServicePort is used for each
configuration.

In the future, this may be changed to pod anti-affinity.

bug 1411501
https://bugzilla.redhat.com/show_bug.cgi?id=1411501

Signed-off-by: Phil Cameron [email protected]

@pecameron
Copy link
Contributor Author

@knobunc PTAL

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, but we need the pod anti-affinity stuff at https://kubernetes.io/docs/user-guide/node-selection/ so that the same spec doesn't land on the same node twice.

Before the port was a cheap way to tell the scheduler to schedule only one (global) instance of the container on a single node... but that is overly limiting. We need to say "only one ipfailover instance of this configuration can land on a single node".

Please change the title too... it sounds like we are adding the restriction.

@knobunc knobunc requested a review from ramr January 13, 2017 13:25
@knobunc
Copy link
Contributor

knobunc commented Jan 13, 2017

@openshift/networking PTAL

@knobunc
Copy link
Contributor

knobunc commented Jan 13, 2017

[test]

@pecameron
Copy link
Contributor Author

@knobunc I looked at the anti-affinity and I think this happens by default. It will try to start replicas on different nodes. I likely read that wrong. Do you know what really happens?

@knobunc
Copy link
Contributor

knobunc commented Jan 17, 2017

@pecameron: If you have two nodes and scale up to three instances, what happens? As I read the anti-affinity docs it will forbid the third from scaling up.

@ramr
Copy link
Contributor

ramr commented Jan 23, 2017

Is there another PR that has the anti-affinity changes? Because if we just remove the port here, kubernetes could schedule 'n' pods on the same node. The easiest way is to mark all your nodes in the cluster except for a single one non-schedulable and make sure that 2 or more pods don't land up on that only schedulable node.
You will need to also generate/add a scheduler.alpha.kubernetes.io/affinity annotation.

@knobunc
Copy link
Contributor

knobunc commented Jan 23, 2017

Ram: We have several needs that seem to be in conflict with one another. In one case we need to be able to schedule multiple failover pods so that we can fail over. But in the other, we need to ensure that only one of each type of configuration land on each node. But we need to allow different configurations to cooexist on the same node.

So allocating a port per configuration seems to be a reasonable compromise until we get on a release with a stable anti-affinity API.

@ramr
Copy link
Contributor

ramr commented Jan 23, 2017

@knobunc yeah just saw the message from davidopp re: node affinity/anti-affinity apis changing in 1.6 - so yeah ports might still be the best solution for the short-term. I think though to allow for multiple ipfailover installations we can change this a little bit - maybe use the port number (or 63000 if we don't want to use up port numbers near 1985) + the vrrp-id-offset to do the equivalent for pod anti-affinity for an ipfailover deployment (reserve so a single instance of the same ipfailover deployment runs on that selected node).

@pecameron
Copy link
Contributor Author

@knobunc @ramr see the bug for more comments. I can go with port 63000. I will go ahead and make the changes for this approach.

@knobunc
Copy link
Contributor

knobunc commented Jan 24, 2017

Make it so

@pecameron pecameron force-pushed the bz1411501 branch 2 times, most recently from 6c84200 to e67a33e Compare January 24, 2017 20:15
@pecameron
Copy link
Contributor Author

@knobunc changed to use port 63000 + vrrp-id, added tests, changed commit title and comment. PTAL

if len(options.ServiceAccount) == 0 {
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 {
return fmt.Errorf("The vrrp-id-offset must be in the range 0..255")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 to 254, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The real range is 1..255, The code changes 0 to 1 so the valid input range is 0..255

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).

Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see 255 here

// same configuration starting on the same node. Since pods for different
// configurations can run on the same node a different ServicePort is used
// for each configuration.
// In the future, this may be changed to pod-affinity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod anti-affinity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -116,7 +116,7 @@ func (p *KeepalivedPlugin) Generate() (*kapi.List, error) {
}

if len(p.Options.VirtualIPs) == 0 {
return nil, fmt.Errorf("you must specify at least one virtual IP address for keepalived to expose")
return nil, fmt.Errorf("you must specify at least one virtual IP address (--virtual-ips=)for keepalived to expose")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -12,8 +12,9 @@ const (
// DefaultType is the default IP Failover type.
DefaultType = "keepalived"

// DefaultServicePort is the default service port.
DefaultServicePort = 1985
// DefaultServicePort is the port assiciated with the ipfailover config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

associated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

// DefaultServicePort is the default service port.
DefaultServicePort = 1985
// DefaultServicePort is the port assiciated with the ipfailover config.
// each ipfailover config has a different ServicePort.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=356' 'error: The vrrp-id-offset must be in the range 0..255'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range is wrong.

Let's add a test for 255 to test for failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range is correct. The default is 0. The actual input range is 0 to 255. When 0 the code sets it to 1.
You want to change the 356 to 256? OK

@pecameron
Copy link
Contributor Author

@knobunc changes made, PTAL

if len(options.ServiceAccount) == 0 {
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 {
return fmt.Errorf("The vrrp-id-offset must be in the range 0..255")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).

Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.

@pecameron
Copy link
Contributor Author

@knobunc the script is not all that straight forward. counter starts at 1 which goes to vrrp_instance (iid) as 1. line 203: virtual_router_id $((vrrpidoffset + iid))
expects the offset to start at 0 which makes the first vrrp-id 1. So the range does need to go to 254.
Changed.

@pecameron pecameron force-pushed the bz1411501 branch 2 times, most recently from 8923218 to 034e447 Compare January 25, 2017 16:27
@pecameron
Copy link
Contributor Author

@knobunc 254 range and "1" default are in.

@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..255'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this offset = 255 and change the comment

Copy link
Contributor

@ramr ramr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions on edge values but otherwise LGTM

if len(options.ServiceAccount) == 0 {
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank")
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 254 {
return fmt.Errorf("The vrrp-id-offset must be in the range 0..254")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we set iid to 0 above then it can be an offset from 0 to 255 - this way we lose 1 vrrp id right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramr The vrrp protocol spec limits the valid vid values to 1..255, 0 is specifically invalid so there are 254 possible valid values. The doc reference is wrong.
For some reason this was coded so the input vrrp-id-offset is 0-254, this is converted to 1..255 internally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool then this looks lgtm

@@ -178,7 +178,7 @@ function generate_vip_section() {
#
function generate_vrrpd_instance_config() {
local servicename=$1
local iid=${2:-"0"}
local iid=${2:-"1"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to set it to 1 instead - that changes the value of virtual router id below to always start from 1 -
virtual_router_id $((vrrpidoffset + iid))

Copy link
Contributor

@ramr ramr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pecameron
Copy link
Contributor Author

@ramr is there anything more to discuss before dropping the changes requested flag?

@pecameron
Copy link
Contributor Author

@knobunc anything left before dropping the changes requested flag?

@ramr
Copy link
Contributor

ramr commented Jan 25, 2017

@pecameron approved it about half an hour back. Maybe you need to refresh the page. thx

@@ -78,7 +78,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh'
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056'
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..254'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--vrrp-id-offset=256

I want that to test the 255 case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

The ipfailover pods for a given configuration must run on
different nodes.  We are using the ServicePort as a mechanism
to prevent multiple pods for same configuration from starting
on the same node. Since pods for different configurations can
run on the same node a different ServicePort is used for each
configuration.

In the future, this may be changed to pod anti-affinity.

bug 1411501
https://bugzilla.redhat.com/show_bug.cgi?id=1411501

Signed-off-by: Phil Cameron <[email protected]>
@pecameron
Copy link
Contributor Author

[TEST]

@knobunc
Copy link
Contributor

knobunc commented Jan 26, 2017

@pecameron find the flake and link to it, and I'll re-run the tests.

@pecameron
Copy link
Contributor Author

@knobunc your las change request is in the image. Anything else needed?

created issue for test failure:
test/end-to-end/core.sh:363: executing 'cat '/tmp/openshift/test-end-to-end-docker//logs/cli-with-token.log'' expecting success and text 'Using in-cluster configuration': the output content test failed
#12678 opened 11 minutes ago by pecameron

@pecameron
Copy link
Contributor Author

@knobunc @ncdc This seems to be (could be?) the problem. What do I need to change??

Running test/end-to-end/core.sh:422: executing 'oc get builds --namespace test -o jsonpath='{.items[0].status.phase}'' expecting any result and text 'Running'; re-trying every 0.2s until completion or 600.000s...
FAILURE after 599.388s: test/end-to-end/core.sh:422: executing 'oc get builds --namespace test -o jsonpath='{.items[0].status.phase}'' expecting any result and text 'Running'; re-trying every 0.2s until completion or 600.000s: the command timed out
Standard output from the command:
Pending
... repeated 33 times
Failed
... repeated 1434 times
Standard error from the command:
[ERROR] PID 32716: hack/lib/cmd.sh:618: return "${return_code}" exited with status 1.
[INFO] Stack Trace:
[INFO] 1: hack/lib/cmd.sh:618: return "${return_code}"
[INFO] 2: test/end-to-end/core.sh:422: os::cmd::try_until_text
[INFO] Exiting with code 1.
[ERROR] PID 31461: hack/test-end-to-end-docker.sh:91: ${OS_ROOT}/test/end-to-end/core.sh exited with status 1.
[INFO] Stack Trace:
[INFO] 1: hack/test-end-to-end-docker.sh:91: ${OS_ROOT}/test/end-to-end/core.sh

@ncdc
Copy link
Contributor

ncdc commented Jan 26, 2017

It appears to be a flake. It's happened to several PRs. I'll open a new issue and rekick the test. Just a minute.

@ncdc
Copy link
Contributor

ncdc commented Jan 26, 2017

Actually, my grep was wrong. It's only happened to this PR in the last week. We can try to test again and see how it does. If it fails the same way, we'll create a flake issue. [test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 959a4bf

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test Running (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13360/) (Base Commit: a4b2294)

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS

@pecameron
Copy link
Contributor Author

@knobunc is this ready to merge?

@knobunc
Copy link
Contributor

knobunc commented Jan 27, 2017

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 959a4bf

@openshift-bot
Copy link
Contributor

openshift-bot commented Jan 27, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13366/) (Base Commit: 6ee94d0) (Image: devenv-rhel7_5794)

@openshift-bot openshift-bot merged commit 83e2ef6 into openshift:master Jan 27, 2017
@pecameron pecameron deleted the bz1411501 branch January 31, 2017 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants