Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change haproxy router to use a certificate list/map file. #11217

Merged
merged 1 commit into from
Oct 14, 2016

Conversation

ramr
Copy link
Contributor

@ramr ramr commented Oct 4, 2016

As discussed in #11201 pulling out the haproxy router changes to use a map for presenting certs with SNI in lieu of serving all certificates from a directory. The map file is reverse-keyed on the host.
Example entries:

/var/lib/haproxy/router/certs/default_header-test-reencrypt.pem reencrypt.header.test  
/var/lib/haproxy/router/certs/u1p1_route-reencrypt.pem reen.example.com  

@liggitt @rajatchopra @knobunc PTAL thx

@ramr
Copy link
Contributor Author

ramr commented Oct 4, 2016

[test]

1 similar comment
@ramr
Copy link
Contributor Author

ramr commented Oct 5, 2016

[test]

*/}}
{{ define "/var/lib/haproxy/conf/cert_config.map" }}
{{ $workingDir := .WorkingDir }}
{{ range $idx, $cfg := .State }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is idx stable? When are the files on disk updated relative to the config being written and reloaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idx stable - didn't get what you mean by that? The files on disk are actually created/updated as routes get processed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're mapping {{$idx}}.pem to a particular hostname. as routes get added/removed, will $idx refer to a different route, and will {{$idx}}.pem be overwritten with a different route's cert?

if that happens asynchronously to the way the config is generated and reloaded, what does haproxy do in the meantime? does it ignore changes to the certs in the dir? does it pick up file changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's the internally generated name of the route namespace+name, so it is unique for routes. So it is "stable".

As re: haproxy loading the cert - it would only do it now if there is an entry in the mapping file it loads at startup. New entries in the mapping file would only be "used" after the next reload. The flip side is on a route delete, the cert would be deleted before the reload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the internally generated name of the route namespace+name, so it is unique for routes

Unique is good. How does the map behave if multiple routes have the same host but specify different certs?

on a route delete, the cert would be deleted before the reload.

do we know if that causes errors or falls back to the default cert or something else? Do we need to move cert deletions to occur after the template is generated and haproxy is reloaded?

@@ -464,7 +464,6 @@ echo "[INFO] Validating routed app response..."
# will be reachable via the ip of its pod.
router_ip=$(oc get pod "${router_pod}" --template='{{.status.podIP}}')
CONTAINER_ACCESSIBLE_API_HOST="${CONTAINER_ACCESSIBLE_API_HOST:-${router_ip}}"
validate_response "-s -k --resolve www.example.com:443:${CONTAINER_ACCESSIBLE_API_HOST} https://www.example.com" "Hello from OpenShift" 0.2 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was this testing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have no idea - but it looked like it was expecting some remanent of a route to exist. The right test would be to create the route and see that it works - which is what the code that follows does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's testing the route that was created earlier from the stibuild template. not seeing why we should delete this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'l take a look and see - might be that the route created is not valid (TLS config).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the map change break the "default cert" behavior of the router? I really don't want to remove this test if it was previously passing and suddenly failed with this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I thought this was the extended validation PR. I have to check this and see what's amiss here.

@ramr
Copy link
Contributor Author

ramr commented Oct 5, 2016

No the default cert behavior remains similar. If the host is not in the map (earlier this was not in the directory), then the default cert would be used.

@ramr ramr force-pushed the router-cert-list branch from f1d3132 to 5f61617 Compare October 5, 2016 19:33
@ramr
Copy link
Contributor Author

ramr commented Oct 5, 2016

[test] to see failure

@ramr
Copy link
Contributor Author

ramr commented Oct 6, 2016

ssh failure [test]

@ramr
Copy link
Contributor Author

ramr commented Oct 6, 2016

@liggitt looks like the endpoints for the frontend service are not available - added some debug statements to see what's happening.

@ramr ramr force-pushed the router-cert-list branch from 41e4760 to 2a4e01f Compare October 6, 2016 05:58
@ramr
Copy link
Contributor Author

ramr commented Oct 6, 2016

Looks like this is the error:

[ALERT] 279/071932 (113) : parsing [/var/lib/haproxy/conf/haproxy.config:129] : 'bind 127.0.0.1:10444' : 'crt-list' : error processing line 1 in file '/var/lib/haproxy/conf/cert_config.map' : unable to load SSL private key from PEM file '/var/lib/haproxy/router/certs/test_route-edge.pem'.  

Will look at where the route is created later today. And that's the reason why deleting the route and re-adding it works. I suspect it was using the default certificate previously which is why we didn't see errors - though that was probably not the intention for the test.

So the downside is checks will be stricter with this change. @liggitt any thoughts/comments?

@ramr ramr force-pushed the router-cert-list branch from 2a4e01f to 5ce393c Compare October 6, 2016 20:08
@ramr
Copy link
Contributor Author

ramr commented Oct 6, 2016

ok this should fix the issue with the test failing.
[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 5ce393c

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9718/)

@liggitt
Copy link
Contributor

liggitt commented Oct 7, 2016

ok this should fix the issue with the test failing.

what was the change? I'm missing the diff

@liggitt
Copy link
Contributor

liggitt commented Oct 7, 2016

the downside is checks will be stricter with this change. @liggitt any thoughts/comments?

not sure what you mean by that... does that mean that previously, invalid certs were tolerated in the directory, and they are not tolerated in the map?

@ramr
Copy link
Contributor Author

ramr commented Oct 9, 2016

@liggitt ignore the second comment on the stricter checks. The real failure was the cert file didn't exist (and was in the map). We don't write cert files if the field doesn't exist and the template code checks if there is a cert before adding an entry to the map.

Edited typo field doesn't exist

@liggitt
Copy link
Contributor

liggitt commented Oct 9, 2016

Ah, sounds good. LGTM

@liggitt
Copy link
Contributor

liggitt commented Oct 9, 2016

Oh, how does the map handle multiple routes (presumably in the same namespace) pointing to the same host?

@ramr
Copy link
Contributor Author

ramr commented Oct 10, 2016

@liggitt similar to the previous version, the certs would be picked up from the alphabetically first route in the same namespace. The first loaded certificate is how haproxy crt (and crt-list works - http://cbonte.github.io/haproxy-dconv/1.5/configuration.html#5.1-crt).
That said, I realized we should also sort the map - will do that as we do for other maps.

@liggitt
Copy link
Contributor

liggitt commented Oct 10, 2016

you can't sort go maps... you'll need to switch it to an array, which is not compatible with existing templates, right?

@ramr
Copy link
Contributor Author

ramr commented Oct 10, 2016

the map file in question here is the cert_config.map file not a golang map. The sort I was referring to was similar to the other maps:
https://github.com/openshift/origin/blob/master/images/router/haproxy/reload-haproxy#L52

@ramr
Copy link
Contributor Author

ramr commented Oct 10, 2016

Actually, I just realized we already sort all the maps thanks to *.map, so this one is good - got 'em monday afternoon blues - need food!!
@liggitt let me know if this good. Thx

@liggitt
Copy link
Contributor

liggitt commented Oct 10, 2016

this looks fine, can you make sure we have a test case with same host, different certs, and make sure it behaves as we expect? can be in a follow-up

@ramr
Copy link
Contributor Author

ramr commented Oct 10, 2016

@liggitt will do in a follow up (or as part of the wildcard domain one as it would need something as well). Thx

@knobunc
Copy link
Contributor

knobunc commented Oct 11, 2016

[merge]

@knobunc
Copy link
Contributor

knobunc commented Oct 12, 2016

re-[merge] network seems to have failed on AWS

@ramr
Copy link
Contributor Author

ramr commented Oct 13, 2016

du failed du: cannot access ‘/tmp/openshift/test-extended/core/logs/openshift.log’: No such file or directory. @knobunc remerge please. Thx

@knobunc
Copy link
Contributor

knobunc commented Oct 14, 2016

[merge] previous error was flake #11094

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 14, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9718/) (Image: devenv-rhel7_5179)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 5ce393c

@openshift-bot openshift-bot merged commit 0994cee into openshift:master Oct 14, 2016
@ramr ramr mentioned this pull request Oct 15, 2016
@ramr ramr deleted the router-cert-list branch February 3, 2017 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants