Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DNS names in egress routes #15409

Merged

Conversation

pravisankar
Copy link

@pravisankar pravisankar commented Jul 21, 2017

Introduced dns-proxy egress router mode that allows specifying DNS name for EGRESS_DESTINATION.
Currently, dns-proxy egress mode implementation is based on HAProxy.
HAProxy 1.6+ version is used to leverage DNS resolution at runtime.

Trello Card: https://trello.com/c/407uoUFz

@pravisankar
Copy link
Author

@openshift/networking PTAL

@pravisankar
Copy link
Author

[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to caae214

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/3408/) (Base Commit: e97ba37) (PR Branch Commit: caae214)

@openshift-merge-robot openshift-merge-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 24, 2017
@bparees bparees removed their assignment Jul 25, 2017
@pravisankar
Copy link
Author

/unassign kargakis
/assign danwinship

@openshift-merge-robot openshift-merge-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 28, 2017
@pravisankar pravisankar force-pushed the egress-haproxy-mode branch from caae214 to 77c1008 Compare July 28, 2017 23:05
tar xvzf haproxy-1.7.5.tar.gz && \
groupadd haproxy && \
useradd -g haproxy haproxy && \
cd haproxy-* && make TARGET=linux2628 CPU=native USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 && make install && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eek... Not sure about this. We don't do anything like that anywhere else...

What exactly is different if we don't have the new haproxy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic DNS resolution feature is only available in HAProxy 1.6+ version (currently we are running haproxy 1.5). I don't know if this method is acceptable currently or not but we have done similar stuff in release-1.3 (https://github.com/openshift/origin/blob/release-1.3/images/router/haproxy/Dockerfile)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dynamic DNS resolution" meaning that if we used haproxy 1.5, it would only do the DNS resolution at startup and not notice any changes after that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need some sort of higher-up (ie, not just networking team) approval for actually building binaries from source in the images like this...

(Another possibility maybe is to pre-build the package into an RPM and make it available in a COPR, like we do for the openvswitch packages for the dind image.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, haproxy 1.5.x resolves DNS at startup and will fail if the IP changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eparis @smarterclayton is this acceptable? (building haproxy from source)..if not, how should we proceed?
Having haproxy 1.6+ will also solve this card: https://trello.com/c/J1ODldZK/516-move-to-a-version-of-haproxy-with-lua-capability

function generate_dns_resolvers() {
echo "resolvers dns-resolver"
# Fetch nameservers from /etc/resolv.conf
declare -a nameservers=$(cat /etc/resolv.conf |grep nameserver|awk -F" " '{print $2}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin style is to put declaration and initialization on separate lines. (IIRC because if you write it the way you did here, and there's an error, the error gets ignored)

also, you don't need cat and grep: $(awk '/nameserver/ {print $2}' /etc/resolv.conf)

also, actually you need a ^ before nameserver so it doesn't get tripped up by the comments NetworkManager sometimes adds:

# NOTE: the libc resolver may not support more than 3 nameservers.
# The nameservers listed below may not be recognized.

echo " nameserver ns$n ${ns}:${NS_PORT}"
done

# Add google DNS servers as fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, don't do that. The user may not want to leak their DNS requests to google.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove.

generate_haproxy_frontends_backends
}

function setup_haproxy_syslog() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. There's no way to get it to just log to stdout/stderr?

We might want to make it not bother doing this unless the pod spec sets some sort of "debug" variable...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default it is not logging stuff to stdout/stderr, I will enable this stuff when 'debug' variable is set.


check_prereqs

rm -f ${CONF}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"${CONF}" (everywhere)

die "Failed to fetch Pod IP"
fi

echo -A PREROUTING -p tcp -d "${pod_ip}" -j DNAT --to-destination "${EGRESS_SOURCE}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the pod_ip match here? We don't do that in the standard egress-router.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the standard egress-router, iptables rules are set on the pod and packets from service IP to pod IP will hit the iptables on the pod that does the actual job.
In this dns proxy case, haproxy is running on the macvlan interface and packets from service IP will only reach the pod IP. This additional rule will forward all tcp packets from pod to macvlan interface which has the egress source IP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually haproxy doesn't need to listen on macvlan interface. It can listen on pod/all interfaces and we don't need this additional iptables forward rule.

}
out := string(outBytes)
for _, frontend := range test.frontends {
if !strings.Contains(out, frontend) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't verify that there are no unexpected additional frontend/backend rules

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you fix this? Basically I was suggesting you should parse the output into static config, resolvers, and frontends sections, so you could then just check for equality rather than just "contains". (This might be easier if you had multiple testing modes; eg, one that just runs generate_dns_resolvers and nothing else, and one that just runs generate_haproxy_frontends_backends and nothing else.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have multiple frontends and backends and equality check will work if the order is predefined among the frontends/backends. We don't have to enforce the order when generating these frontends/backends, instead we are checking for exact number of frontends and backends expected. Lines 137 to 140 and 147 to 150 validates this case.

}
}

func TestHAProxyDefaults(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is really testing anything useful. OTOH, it might be good to test the resolver code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to catch the case where config file was overwritten by some write instead of append('>' instead of '>>' in the script). I will add the resolver test.

@pravisankar pravisankar force-pushed the egress-haproxy-mode branch 3 times, most recently from 39825e5 to 6cdaa67 Compare August 2, 2017 01:27
@openshift-merge-robot openshift-merge-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 2, 2017
@pravisankar
Copy link
Author

@danwinship updated, ptal

@dcbw
Copy link
Contributor

dcbw commented Aug 4, 2017

@pravisankar maybe need ot rebase to get an update to contrib/completions/OWNERS? It's failing the verify there with FAILURE: Generated completions out of date. Please run hack/update-generated-completions.sh

@pravisankar pravisankar force-pushed the egress-haproxy-mode branch from 6cdaa67 to 9e4afb1 Compare August 7, 2017 18:52
tar xvzf haproxy-1.7.5.tar.gz && \
groupadd haproxy && \
useradd -g haproxy haproxy && \
cd haproxy-* && make TARGET=linux2628 CPU=native USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 && make install && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dynamic DNS resolution" meaning that if we used haproxy 1.5, it would only do the DNS resolution at startup and not notice any changes after that?

tar xvzf haproxy-1.7.5.tar.gz && \
groupadd haproxy && \
useradd -g haproxy haproxy && \
cd haproxy-* && make TARGET=linux2628 CPU=native USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 && make install && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need some sort of higher-up (ie, not just networking team) approval for actually building binaries from source in the images like this...

(Another possibility maybe is to pre-build the package into an RPM and make it available in a COPR, like we do for the openvswitch packages for the dind image.)

RUN INSTALL_PKGS="rsyslog gcc make openssl-devel pcre-devel tar wget socat" && \
yum install -y $INSTALL_PKGS && \
rpm -V $INSTALL_PKGS && \
wget http://www.haproxy.org/download/1.7/src/haproxy-1.7.5.tar.gz && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are doing this, can we verify the md5sum as well before building?

function generate_haproxy_defaults() {
echo "
global
log 127.0.0.1 local2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cause problems if the debug logging isn't enabled?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with and without debug logging and found no issues.

}
out := string(outBytes)
for _, frontend := range test.frontends {
if !strings.Contains(out, frontend) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you fix this? Basically I was suggesting you should parse the output into static config, resolvers, and frontends sections, so you could then just check for equality rather than just "contains". (This might be easier if you had multiple testing modes; eg, one that just runs generate_dns_resolvers and nothing else, and one that just runs generate_haproxy_frontends_backends and nothing else.)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2017
@ghyde
Copy link
Contributor

ghyde commented Jan 18, 2018

Now that the router has been updated to HAProxy 1.8 (#18053), can the egress router be updated to support dynamic DNS resolution?

Ravi Sankar Penta added 3 commits February 13, 2018 22:20
Introduced dns-proxy egress router that allows specifying DNS name for EGRESS_DESTINATION.
Currently, dns-proxy egress mode implementation is based on HAProxy.
HAProxy 1.6+ version is used to leverage DNS resolution at runtime.
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 14, 2018
@pravisankar pravisankar requested a review from knobunc February 15, 2018 19:08
@pravisankar
Copy link
Author

@openshift/sig-networking @danwinship @knobunc ready for re-review, PTAL

@pravisankar
Copy link
Author

/retest

@danwinship
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, pravisankar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2018
@eparis
Copy link
Member

eparis commented Feb 19, 2018

/hold
looks like a 3.10 thing...

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 19, 2018
@eparis eparis added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Feb 19, 2018
@pravisankar
Copy link
Author

/hold cancel
master is open for 3.10

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 1, 2018
@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@pravisankar
Copy link
Author

/retest

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 15409, 18763).

@openshift-merge-robot openshift-merge-robot merged commit 65dc8e1 into openshift:master Mar 2, 2018
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 2, 2018

@pravisankar: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_templates 9e4afb1 link /test extended_templates
ci/openshift-jenkins/extended_conformance_install_update 9e4afb1 link /test extended_conformance_install_update
ci/openshift-jenkins/extended_networking_minimal 5a94143 link /test extended_networking_minimal

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking lgtm Indicates that a PR is ready to be merged. sig/networking size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.