Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic high availability for auto egress IPs #20620

Merged

Conversation

danwinship
Copy link
Contributor

Backport of #19578 to 3.10.z

If a namespace has multiple egress IPs, monitor egress traffic and
switch to an alternate egress IP if the currently-selected one appears
dead.
Most dump-flows calls are part of health checks and don't normally
need to be logged about unless they fail.
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 13, 2018
@liggitt
Copy link
Contributor

liggitt commented Aug 13, 2018

/retest


nodesByNodeIP map[string]*nodeEgress
namespacesByVNID map[uint32]*namespaceEgress
egressIPs map[string]*egressIPInfo

changedEgressIPs []*egressIPInfo
changedNamespaces []*namespaceEgress
changedEgressIPs map[*egressIPInfo]bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt had asked:

what was the intent of making these map keys pointers? are the same instances of egressIPInfo received on every change notification? if a remove/add is enqueued, how do we ensure we process them in the right order?

The maps are only non-empty during the processing of a single event. Previously, if, say, a namespace with 3 egress IPs was deleted, then we'd end up adding the corresponding *namespaceEgress object to changedNamespaces 3 times, and then at the end in syncEgressIPs we'd have to de-dup that list so we only processed the changed namespace once. This commit is just changing things to use a map instead, with the namespaces/IPs as keys, so that we don't need a separate de-duping step at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, keying a map using a pointer to a struct containing data does not dedupe if you get pointers to two different instances of the struct, even it those two instances contain the same data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. They're the same pointers. (They're all current elements of either the namespacesByVNID or egressIPs maps above.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the namespacesByVNID and egressIPs maps store the same value under multiple keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code was already de-duping them by pointer equality before, it was just doing it in a different part of the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the namespacesByVNID and egressIPs maps store the same value under multiple keys?

no

seems like changedEgressIPs and changedNamespaces could maybe store the keys from the egressIPs and namespacesByVNID maps instead. it's too easy to introduce changes that accidentally get a map wrong wrong when using pointers as map keys.

the code was already de-duping them by pointer equality before, it was just doing it in a different part of the code

ok. this lgtm then, but a follow-up to stop using pointers as map keys would be ideal.

@liggitt
Copy link
Contributor

liggitt commented Aug 14, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 14, 2018
@openshift-merge-robot openshift-merge-robot merged commit 1584b2d into openshift:release-3.10 Aug 14, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danwinship danwinship deleted the auto-egress-ip-ha branch January 31, 2019 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking lgtm Indicates that a PR is ready to be merged. sig/networking size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants