-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only one ipfailover container can be run on a node #12472
Conversation
@knobunc PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, but we need the pod anti-affinity stuff at https://kubernetes.io/docs/user-guide/node-selection/ so that the same spec doesn't land on the same node twice.
Before the port was a cheap way to tell the scheduler to schedule only one (global) instance of the container on a single node... but that is overly limiting. We need to say "only one ipfailover instance of this configuration can land on a single node".
Please change the title too... it sounds like we are adding the restriction.
@openshift/networking PTAL |
[test] |
@knobunc I looked at the anti-affinity and I think this happens by default. It will try to start replicas on different nodes. I likely read that wrong. Do you know what really happens? |
@pecameron: If you have two nodes and scale up to three instances, what happens? As I read the anti-affinity docs it will forbid the third from scaling up. |
Is there another PR that has the anti-affinity changes? Because if we just remove the port here, kubernetes could schedule 'n' pods on the same node. The easiest way is to mark all your nodes in the cluster except for a single one non-schedulable and make sure that 2 or more pods don't land up on that only schedulable node. |
Ram: We have several needs that seem to be in conflict with one another. In one case we need to be able to schedule multiple failover pods so that we can fail over. But in the other, we need to ensure that only one of each type of configuration land on each node. But we need to allow different configurations to cooexist on the same node. So allocating a port per configuration seems to be a reasonable compromise until we get on a release with a stable anti-affinity API. |
@knobunc yeah just saw the message from davidopp re: node affinity/anti-affinity apis changing in 1.6 - so yeah ports might still be the best solution for the short-term. I think though to allow for multiple ipfailover installations we can change this a little bit - maybe use the port number (or 63000 if we don't want to use up port numbers near 1985) + the vrrp-id-offset to do the equivalent for pod anti-affinity for an ipfailover deployment (reserve so a single instance of the same ipfailover deployment runs on that selected node). |
6c84200
to
e67a33e
Compare
@knobunc changed to use port 63000 + vrrp-id, added tests, changed commit title and comment. PTAL |
if len(options.ServiceAccount) == 0 { | ||
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank") | ||
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 { | ||
return fmt.Errorf("The vrrp-id-offset must be in the range 0..255") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 to 254, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The real range is 1..255, The code changes 0 to 1 so the valid input range is 0..255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).
Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see 255 here
// same configuration starting on the same node. Since pods for different | ||
// configurations can run on the same node a different ServicePort is used | ||
// for each configuration. | ||
// In the future, this may be changed to pod-affinity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod anti-affinity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
@@ -116,7 +116,7 @@ func (p *KeepalivedPlugin) Generate() (*kapi.List, error) { | |||
} | |||
|
|||
if len(p.Options.VirtualIPs) == 0 { | |||
return nil, fmt.Errorf("you must specify at least one virtual IP address for keepalived to expose") | |||
return nil, fmt.Errorf("you must specify at least one virtual IP address (--virtual-ips=)for keepalived to expose") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -12,8 +12,9 @@ const ( | |||
// DefaultType is the default IP Failover type. | |||
DefaultType = "keepalived" | |||
|
|||
// DefaultServicePort is the default service port. | |||
DefaultServicePort = 1985 | |||
// DefaultServicePort is the port assiciated with the ipfailover config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
associated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// DefaultServicePort is the default service port. | ||
DefaultServicePort = 1985 | ||
// DefaultServicePort is the port assiciated with the ipfailover config. | ||
// each ipfailover config has a different ServicePort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056' | |||
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=356' 'error: The vrrp-id-offset must be in the range 0..255' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This range is wrong.
Let's add a test for 255 to test for failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range is correct. The default is 0. The actual input range is 0 to 255. When 0 the code sets it to 1.
You want to change the 356 to 256? OK
@knobunc changes made, PTAL |
if len(options.ServiceAccount) == 0 { | ||
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank") | ||
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 255 { | ||
return fmt.Errorf("The vrrp-id-offset must be in the range 0..255") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that. In config-generators.sh line 270 it creates a counter variable that starts at 1. That is passed into generate_vrrpd_instance_config at line 290 and is put into $iid (although the default of 0 is probably wrong and we probably should change the $iid default at line 209 to 1).
Then we add that to the offset. So the first allocated will be offset + 1. So the range you can set the offset to only goes to 254.
@knobunc the script is not all that straight forward. counter starts at 1 which goes to vrrp_instance (iid) as 1. line 203: virtual_router_id $((vrrpidoffset + iid)) |
8923218
to
034e447
Compare
@knobunc 254 range and "1" default are in. |
@@ -81,7 +81,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056' | |||
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..255' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this offset = 255 and change the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions on edge values but otherwise LGTM
if len(options.ServiceAccount) == 0 { | ||
return fmt.Errorf("you must specify a service account for the ipfailover pod with --service-account, it cannot be blank") | ||
if options.VRRPIDOffset < 0 || options.VRRPIDOffset > 254 { | ||
return fmt.Errorf("The vrrp-id-offset must be in the range 0..254") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if we set iid
to 0 above then it can be an offset from 0 to 255 - this way we lose 1 vrrp id right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ramr The vrrp protocol spec limits the valid vid values to 1..255, 0 is specifically invalid so there are 254 possible valid values. The doc reference is wrong.
For some reason this was coded so the input vrrp-id-offset is 0-254, this is converted to 1..255 internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok cool then this looks lgtm
@@ -178,7 +178,7 @@ function generate_vip_section() { | |||
# | |||
function generate_vrrpd_instance_config() { | |||
local servicename=$1 | |||
local iid=${2:-"0"} | |||
local iid=${2:-"1"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to set it to 1 instead - that changes the value of virtual router id below to always start from 1 -
virtual_router_id $((vrrpidoffset + iid))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ramr is there anything more to discuss before dropping the changes requested flag? |
@knobunc anything left before dropping the changes requested flag? |
@pecameron approved it about half an hour back. Maybe you need to refresh the page. thx |
@@ -78,7 +78,10 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --ipta | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: "1177"' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-script="ChkScript.sh" --dry-run -o yaml' 'value: ChkScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --notify-script="NotScript.sh" --dry-run -o yaml' 'value: NotScript.sh' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=56' 'hostPort: 63056' | |||
os::cmd::expect_failure_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml --vrrp-id-offset=256' 'error: The vrrp-id-offset must be in the range 0..254' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--vrrp-id-offset=256
I want that to test the 255 case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
The ipfailover pods for a given configuration must run on different nodes. We are using the ServicePort as a mechanism to prevent multiple pods for same configuration from starting on the same node. Since pods for different configurations can run on the same node a different ServicePort is used for each configuration. In the future, this may be changed to pod anti-affinity. bug 1411501 https://bugzilla.redhat.com/show_bug.cgi?id=1411501 Signed-off-by: Phil Cameron <[email protected]>
[TEST] |
@pecameron find the flake and link to it, and I'll re-run the tests. |
@knobunc your las change request is in the image. Anything else needed? created issue for test failure: |
@knobunc @ncdc This seems to be (could be?) the problem. What do I need to change?? Running test/end-to-end/core.sh:422: executing 'oc get builds --namespace test -o jsonpath='{.items[0].status.phase}'' expecting any result and text 'Running'; re-trying every 0.2s until completion or 600.000s... |
It appears to be a flake. It's happened to several PRs. I'll open a new issue and rekick the test. Just a minute. |
Actually, my grep was wrong. It's only happened to this PR in the last week. We can try to test again and see how it does. If it fails the same way, we'll create a flake issue. [test] |
Evaluated for origin test up to 959a4bf |
continuous-integration/openshift-jenkins/test Running (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13360/) (Base Commit: a4b2294) |
continuous-integration/openshift-jenkins/test SUCCESS |
@knobunc is this ready to merge? |
[merge] |
Evaluated for origin merge up to 959a4bf |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13366/) (Base Commit: 6ee94d0) (Image: devenv-rhel7_5794) |
The ipfailover pods for a given configuration must run on
different nodes. We are using the ServicePort as a mechanism
to prevent multiple pods for same configuration from starting
on the same node. Since pods for different configurations can
run on the same node a different ServicePort is used for each
configuration.
In the future, this may be changed to pod anti-affinity.
bug 1411501
https://bugzilla.redhat.com/show_bug.cgi?id=1411501
Signed-off-by: Phil Cameron [email protected]