-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ipfailover configurability #11644
Conversation
Is this a check script for the other container, or for the processes in the keepalived container itself? |
@knobunc Here is the first draft. It covers the wants in the trello card. PTAL |
The descriptions in the PR aren't enough for me to understand what the scripts are supposed to do. More detail in the doc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is too late for this, but generating a script with a script ... there's got to be a better way.
# When the DC supplies an (non null) iptables chain | ||
# (OPENSHIFT_HA_IPTABLES_CHAIN) make sure the rule to pass keepalived | ||
# multicast (224.0.0.18) traffic is in the table. | ||
HA_IPTABLES_CHAIN=${OPENSHIFT_HA_IPTABLES_CHAIN:-""} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this line and all the others:
HA_IPTABLES_CHAIN="${OPENSHIFT_HA_IPTABLES_CHAIN:-""}"
^ ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov I shifted OPENSHIFT_HA_IPTABLES_CHAIN to here to be more consistent.
I didn't create this file and am just adding a few fields. I am not sure how far to go with this.
This is work in progress. Thanks for your comments and I will modify the code as suggested. I am not sure how far I should go beyond adding this feature. Three is discussion on converting this to a go program and using the go template to generate the file. However, that is future work (if we decide to do it.)
The scripting here generates the config file for keepalived.
@@ -10,7 +10,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/utils.sh" | |||
|
|||
# Constants. | |||
readonly CHECK_SCRIPT_NAME="chk_${HA_CONFIG_NAME//-/_}" | |||
readonly CHECK_INTERVAL_SECS=2 | |||
readonly CHECK_INTERVAL_SECS=${HA_CHECK_INTERVAL} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readonly CHECK_INTERVAL_SECS="${HA_CHECK_INTERVAL}"
^ ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -59,6 +59,11 @@ function generate_script_config() { | |||
if [ "$port" = "0" ]; then | |||
echo " script \"true\"" | |||
else | |||
if [[ -n $HA_CHECK_SCRIPT ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if [[ -n "${HA_CHECK_SCRIPT}" ]]; then
^ ^ ^^
@@ -59,6 +59,11 @@ function generate_script_config() { | |||
if [ "$port" = "0" ]; then | |||
echo " script \"true\"" | |||
else | |||
if [[ -n $HA_CHECK_SCRIPT ]]; then | |||
echo " if [[ -f $HA_CHECK_SCRIPT ]]; then" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you generating a script here? This is pretty insane. This expansion of $HA_CHECK_SCRIPT
will happen at echo
time, not at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov This is creating a config file for keepalived. One of the parameters to keepalived is the name of a check script that keepalived should call. The expansion must happen when the config file is created.
@@ -1,5 +1,6 @@ | |||
#!/bin/bash -e | |||
|
|||
# set -x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
@smarterclayton - I added more to the description. See the trelo card and bug |
@stevekuznetsov Is there more information you need here? Are there any further changes needed? |
@@ -65,7 +89,7 @@ ADMIN_EMAILS=(${OPENSHIFT_HA_ADMIN_EMAILS:-"root@localhost"}) | |||
EMAIL_FROM="[email protected]" | |||
|
|||
# IP address of the SMTP server. | |||
SMTP_SERVER=${OPENSHIFT_HA_SMTP_SERVER:-"127.0.0.1"} | |||
SMTP_SERVER="${OPENSHIFT_HA_SMTP_SERVER:-"127.0.0.A1"}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov Oops, I missed that. Its now fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo, the rest of the Bash looks ... ok ... to me. I think it would be reasonable to open a card or an issue to investigate if we can achieve the same features without generating a Bash script from within a Bash script.
@stevekuznetsov You have "Changes requested". What additional changes would you like to have made? |
@pecameron There is one more comment about an IP address #11644 (review) I would also like to see if we can open an issue to investigate an alternative approach to this than generating a script from within a script. |
@stevekuznetsov Yeah, @pecameron already talked to me about the possibility of changing over to a template file, or something less horrific. We opened the card https://trello.com/c/9w6pIkIU |
@stevekuznetsov @knobunc This code is generating the keepalived config file. keepalived reads this file to configure itself. It may look like a script but it is the config file. Even if we move to a template, the resulting file will still look the same as this file. It will look like a script and it will still be the same config file. OK, I am missing something here. Do we want to change keepalived to accept its configuration in a different format? |
@pecameron Longer-term, we probably want to move it to go with a template. But that will be done as a feature later. |
@stevekuznetsov I made the changes. PTAL |
I do admit I'm not well versed on this but diffs like this are clearly writing Bash code into a file somewhere. Whether that's a Bash script to be run later or a config file that allows for Bash inline... we're generating executable Bash code dynamically. I feel like that is a very fragile approach, and, yes, it should be something that we think about later, it does not block this PR. |
@knobunc There is 1 approved on this, Is there something to do before you can approve it as well? Is this ready to merge? |
@pecameron it is ready to merge. I'm giving master until Monday, then I have a few to put in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[test] |
Failed:
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/8687/consoleText |
@@ -72,6 +72,9 @@ os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry- | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o name' 'deploymentconfig/ipfailover' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml' '1.2.3.4' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --iptables-chain="MY_CHAIN" --dry-run -o yaml' 'value: MY_CHAIN' | |||
os::cmd::expect_success_and_text 'oadm ipfailover --virtual-ips="1.2.3.4" --check-interval=1177 --dry-run -o yaml' 'value: 1177' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stronger checks here, if possible, would look like:
$ oadm ipfailver .... # create an API object
$ oc get <object> -o jsonpath='{.path.to.check-interval}' # expecting 1177
Not sure if this is possible or how the ipfailover configuration is serialized in etcd, but that sort of check is more declarative and less fragile than grep
ping through some YAML.
I don't get this. The yaml does have 1177 and it didn't match. Do you
|
Yes, it must be exactly that -- you are looking for the literal text |
In addition, you're not checking that it's the check interval value that is set to 1177, nor that it isn't set to 11775, or that not-value is set to 1177... etc |
Steve, What do you think? phil === BEGIN TEST CASE ===
On 11/22/2016 08:56 AM, Steve Kuznetsov wrote:
|
Yes, the following would work:
But, for all of the reasons above I still think it is much better to not do |
[test] |
[test] yum failed |
@knobunc Still need help with this... |
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/8905/consoleText |
That's a spurious error. Doesn't actually cause failure. |
So, what do we do now?
…On 12/08/2016 04:08 PM, Clayton Coleman wrote:
That's a spurious error. Doesn't actually cause failure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11644 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANUgelizNLrMN0_xmwGO_RpJq-rP8yobks5rGHHmgaJpZM4Kjh5W>.
|
[test] |
[test] |
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_check/9400/consoleText |
Add options to 'oadm ipfailover' to configure the check script and notify scripts and also control the period the check script runs. Keepalived periodically checks whether the application is running properly. In the default case the test is a simple verification that something is listening on the watch port. This PR permits the user to supply an additional check script that is run in the ipfailover container context to verify that the application is operating properly. For example, a web server can be tested by accessing the watch port and verifying the response. Whenever a node changes state to MASTER, BACKUP, or FAULT a notify script can be called. This script has 3 parameters filled in by keepalived: $1 - "GROUP"|"INSTANCE" $2 - name of group or instance $3 - target state of transition ("MASTER"|"BACKUP"|"FAULT") --check-script="check_script" The check script is a script in the keepalived container that verifies the service is running properly. These checks are in addition to verifying that the watch port is listening. --notify-script="notify_script" The notify script is a script in the keepalived container that is called whenever the keepalived state transitions to (MASTER|BACKUP|FAULT) --check-interval=<n> The check script is run every <n> seconds. Default is 2. Note: the scripts name is the full path to the script. Resolves 1362163 Signed-off-by: Phil Cameron <[email protected]>
Evaluated for origin test up to bce0038 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12335/) (Base Commit: fb0bbde) |
@knobunc are we ready to merge this? |
[merge] |
1 similar comment
[merge] |
@knobunc This failed again, passed once before you did the merge and failed after. Maybe another merge will work... |
@knobunc https://bugzilla.redhat.com/show_bug.cgi?id=1362163 comment #6 is for the doc changes. The code is not yet in... |
[merge] last flaked with #11016 (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12459/) |
[merge] last flaked with #10228 (https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_conformance/9749/consoleText) |
Evaluated for origin merge up to bce0038 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12567/) (Base Commit: 05e4f0e) (Image: devenv-rhel7_5584) |
Openshift 3.5 feature.
Add options to 'oadm ipfailover' to configure the check script and
notify scripts and also control the period the check script runs.
Keepalived periodically checks whether the application is running properly.
In the default case the test is a simple verification that something is listening
on the watch port. This PR permits the user to supply an additional check script
that is run in the ipfailover container context to verify that the application is
operating properly. For example, a web server can be tested by accessing the
watch port and verifying the response.
Whenever a node changes state to MASTER, BACKUP, or FAULT a notify
script can be called. This script has 3 parameters filled in by keepalived:
$1 - "GROUP"|"INSTANCE"
$2 - name of group or instance
$3 - target state of transition ("MASTER"|"BACKUP"|"FAULT")
--check-script="check_script"
The check script is a script in the keepalived container that verifies
the service is running properly. The script must return 0 for OK and 1 for FAIL.
These checks are in addition to verifying that the watch port is listening.
--notify-script="notify_script"
The notify script is a script in the keepalived container that is
called whenever the keepalived state transitions to (MASTER|BACKUP|FAULT)
--check-interval=
The check script is run every seconds. Default is 2.
Note: the scripts name is the full path to the script.
Fixes bug 1362163
https://trello.com/c/228zu7Br/267-5-improve-the-configurability-of-the-ipfailover-container-operations
Signed-off-by: Phil Cameron [email protected]