-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network component should refresh certificates if they expire #17135
Network component should refresh certificates if they expire #17135
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/unassign |
/unassign |
@deads2k as discussed |
22d681d
to
08d8a59
Compare
/retest |
9740cf7
to
64084b5
Compare
Tested on a bootstrap cluster with cert rotation. This is GTG
|
/retest |
} | ||
|
||
lastCert := manager.Current() | ||
go wait.Until(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this closes all connection some indeterminant amount of time after the tlsConfig.GetClientCertificate
has started returning a different value. However, since there is no synchronization between client users and this, it is possible that an old client certificate was returned, the gofunc is suspended, the gofunc is much later unsuspended, and used for a dial after the connections were closed. Will the connection fail properly and re-establish in that case?
It's not very likely, so as long as it eventually recovers (on some reasonable timescale) it seems ok, but it does seem like a risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking here:
- tlsConfig.GetClientCertificate is called after a connection has been accepted, and therefore any cert returned by this method must also have a connection in the connTracker before it returns a cert
- The monitor loop is guaranteed to close any connections registered before the new cert shows up (once the monitor loop sees a new cert, you can only ever close all old certs and some new certs)
- A connection opened (and thus registered) which hasn't yet asked for a cert because it is delayed is still cleaned up whenever a new cert shows up
So I think we're ok, but we may want to talk more tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 is not true in this case. Going to do the thing we discussed in person - guarantee a minimum window between cert rotation and refresh.
}, period, stopCh) | ||
|
||
clientConfig.Transport = utilnet.SetTransportDefaults(&http.Transport{ | ||
Proxy: http.ProxyFromEnvironment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we used a custom proxy function to handle CIDRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we used a custom proxy function to handle CIDRs.
Yeah, apimachinery/pkg/util/net.NewProxierWithNoProxyCIDR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why not SetTransportDefaults
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pure copy of the upstream - agree it should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, I'll follow up upstream for the other place.
if now.After(cert.Leaf.NotAfter) { | ||
if now.Sub(m.lastCheck) > m.minimumRefresh { | ||
glog.V(2).Infof("Current client certificate is expired, checking from disk") | ||
cert = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you've set this, you're setting yourself up to hotloop on this aren't you? I'm not seeing another check against lastCheck
after the cert is nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sets function cert, not m.cert. We always return the last cert we loaded.
64084b5
to
898e5ae
Compare
Symlinks must be to absolute paths, or relative to the target. Absolute is easier here.
898e5ae
to
b13b725
Compare
A single process openshift start node has both kubelet and network. Kubelet rotates its client certs - network does not. Until we split out network we need to do something minimal to ensure the cert is refreshed. Use the same code as the kubelet, but when the cert expires check disk and see if it was refreshed. That should work for bootstrapped environments because the file is updated.
b13b725
to
097350e
Compare
All comments addressed, going to test in a real cluster soon and then label |
/retest Real bootstrapped cluster test successful, all comments addressed, labelling |
Note the 10s gap between detection and refresh |
/retest |
Automatic merge from submit-queue. |
@smarterclayton |
The merged change is using SetTransportDefaluts which sets that.
…On Mon, Nov 6, 2017 at 8:57 AM, David Eads ***@***.***> wrote:
@smarterclayton <https://github.com/smarterclayton> SetTransportDefaults
might screw you in a real environment. We make use of that CIDR respect in
ansible.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p1owhIbmRmT_h0m-IcayIptLfm65ks5szxAwgaJpZM4QOw9s>
.
|
A single process openshift start node has both kubelet and network.
Kubelet rotates its client certs - network does not. Until we split out
network we need to do something minimal to ensure the cert is refreshed.
Use the same code as the kubelet, but when the cert expires check disk
and see if it was refreshed. That should work for bootstrapped
environments because the file is updated.