Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the router eventqueue key function #13494

Merged
merged 1 commit into from
Mar 24, 2017

Conversation

JacobTanenbaum
Copy link
Contributor

changing the router eventqueue key function so that there is a higher chance that
each item will have a unique key so the router does not panic.

originally the thought was to add the creation timestamp because it was not user
editable but the accessor function meta.CreationTimestamp() only gives the timestamp
to the second and since these actions need to occur quickly a second is too long. Only
adding creation timestamp I was able to observe the panic with the test script. I
decided to use UID because it is much more likely that the UID is unique.

Bug: 1429823 Link

@JacobTanenbaum
Copy link
Contributor Author

[test]

@JacobTanenbaum
Copy link
Contributor Author

@knobunc @openshift/networking PTAL

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 22, 2017
@knobunc
Copy link
Contributor

knobunc commented Mar 22, 2017

Hm. But can't I crash the router as an unprivileged user by repeatedly re-sending the same route object on a create with the UID set?

@ncdc
Copy link
Contributor

ncdc commented Mar 22, 2017

@smarterclayton the networking team is concerned that if they make the key namespace/name/uid that someone could delete and recreate a route/endpoint/whatever with the same key and still end up with a panic (if the event queue sees the delete and the add before the router calls pop). Do you think this is sufficient or is there something else you can think of (that doesn't involve replacing event queue with an informer)?

@knobunc
Copy link
Contributor

knobunc commented Mar 22, 2017

@ncdc @smarterclayton @danwinship @marun @dcbw -- I just tried the test of:

$ oc export route > /tmp/r
  [edit the name to make it unique, but leave all other fields alone]
$ oc create -f /tmp/r
$ oc get route  -o yaml | grep uid
    uid: d0a5e624-0eff-11e7-8a2d-02427a236c99
    uid: 055e1ad7-0f00-11e7-8a2d-02427a236c99

So, with routes at least, the uids are being updated. I know Andy, Dan (Winship), and I worked on debugging a networking problem that ended up being caused by the namespaces having the same uid when the user dumped them and edited them to create new ones. So perhaps the problem is just with certain objects? Or perhaps the problem is now solved somewhere sytstemically.

Anyway, this LGTM now. I have to revert the other patch from @ramr to crash the router (#13502).

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the nit. Thanks!

@@ -56,25 +57,39 @@ func NewDefaultRouterControllerFactory(oc osclient.RoutesNamespacer, kc kclients
}
}

func routerKeyFn(obj interface{}) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just put a comment in here explaining where this came from and why we pulled it in here.

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@smarterclayton
Copy link
Contributor

Looks ok, as long as we can reproduce reliably before and ensure it's fixed after

@knobunc
Copy link
Contributor

knobunc commented Mar 23, 2017

[test]

@knobunc
Copy link
Contributor

knobunc commented Mar 23, 2017

@JacobTanenbaum failed with:

./pkg/router/controller/factory/factory.go
!!! gofmt needs to be run on the following files: 
Try running 'gofmt -s -d [path]'
Or autocorrect with 'hack/verify-gofmt.sh | xargs -n 1 gofmt -s -w'

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2017
changing the router eventqueue key function so that there is a higher chance that
each item will have a unique key so the router does not panic.

originally the thought was to add the creation timestamp because it was not user
editable but the accessor function meta.CreationTimestamp() only gives the timestamp
to the second and since these actions need to occur quickly a second is too long. Only
adding creation timestamp I was able to observe the panic with the test script. I
decided to use UID because it is much more likely that the UID is unique.

Bug: 1429823

changelog:

added a note explaining why routerKeyFn was added
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to fd723fd

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/448/) (Base Commit: 367a8d3)

@knobunc
Copy link
Contributor

knobunc commented Mar 24, 2017

[merge]

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 24, 2017 via email

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to fd723fd

@JacobTanenbaum
Copy link
Contributor Author

JacobTanenbaum commented Mar 24, 2017

@smarterclayton yes there is a reproducer
https://gist.github.com/ramr/58dbdc3c5982db7b3c3154eb4bca60c8
$ ./reproduce-eq-panic.sh [<route-json-yaml-file>]

The caveat being our routers recover from the panic while the customers is not able too... After applying this PR the router no longer panics

@smarterclayton
Copy link
Contributor

smarterclayton commented Mar 24, 2017 via email

@openshift-bot
Copy link
Contributor

openshift-bot commented Mar 24, 2017

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/199/) (Base Commit: ac3b692) (Image: devenv-rhel7_6100)

@openshift-bot openshift-bot merged commit 37777d2 into openshift:master Mar 24, 2017
@knobunc
Copy link
Contributor

knobunc commented Mar 24, 2017

@smarterclayton Ok, we'll provide that. Does the test case need to be back-ported too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants