-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with auto-egress-ip mark conflicting with kube-proxy's masqueradeBit #18121
Deal with auto-egress-ip mark conflicting with kube-proxy's masqueradeBit #18121
Conversation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Dan
/lgtm |
@dcbw PTAL |
721dd58
to
c682166
Compare
c682166
to
cceabf8
Compare
/lgtm |
/approve |
/assign @smarterclayton Clayton, PTAL... we need sign-off for cmd again. Thanks! |
/lgtm |
@smarterclayton can you approve us for the /pkg/cmd change please? |
/approve |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
If the iptables rules get reordered after a flush, then it's possible that kube-proxy rules that look at MasqueradeBit will accidentally match a packet that was intended for an auto-egress-ip rule. To avoid that, change the code to avoid using that bit of pkt_mark. This means we now only have 31 bits to work with instead of 32, so make the mark be based on the (24-bit) VNID rather than the (32-bit) egress IP address.
cceabf8
to
d8ec67e
Compare
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw, knobunc, smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue. |
Automatic merge from submit-queue. Drop auto-egress-IP rules when egress IP is removed from NetNamespace (Previously we were only doing it when the NetNamespace was deleted.) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1540846 (does not conflict with #18121 so it doesn't matter which merges first)
Automatic merge from submit-queue. Fix reassignment of egress IP after removal When dropping an egress IP from eth0, we weren't updating our internal state to reflect that we had done that, so if you added it back again it wouldn't do everything it needed to do. (Introduced in #18121 but not discoverable until after #18547 was fixed.) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1547899
auto-egress-ip uses the skb mark (OVS
pkt_mark
, iptablesmark
) to communicate information from OVS rules to iptables rules. Normally this works fine, but if the iptables rules get flushed and recreated, it's possible that the OpenShift and kube-proxy rules will get added back in the "wrong" order, and then the kube-proxy rules will start matching some of the auto-egress-ip packets, causing the wrong thing to happen.Forcibly fixing the iptables rule order is difficult and fragile. Changing the implementation to not use the mark would also not be easy (and would be a lot of code to backport to 3.7). So I fixed it by just changing the code to use mark values that don't conflict with the bit that kube-proxy is using.
This means we only have 31 bits of mark to work with, so we can't reliably use the IP address as the mark value any more. So I switched to using the namespace's VNID instead, since those are only 24 bits. But this required reorganizing the code a bit because previously we were setting up the iptables rules for egress IPs before we knew what namespaces they were associated with.
First commit ("Make sure oc.tunMAC gets set even if AlreadySetUp()") is from #18049
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1527642
@openshift/sig-networking PTAL