Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.3 crash picks #10526

Closed
wants to merge 9 commits into from
Closed

1.3 crash picks #10526

wants to merge 9 commits into from

Conversation

Andy Goldstein added 9 commits August 18, 2016 22:13
Protect access of the original writer. Panics if anything has wrote
into the original writer or the writer is hijacked when times out.
The Pop method should allow a caller to requeue an item while under the
fifo lock, to avoid races on deletes.
…ed as read-only.

Send a node event when this happens and hint to the administrator
about the remediation.
Previously it was trying to use a nil pod variable if error was returned
from the pod update call.

(note that the rest of 26680 is unit test related, and not picked here)
@ncdc
Copy link
Contributor Author

ncdc commented Aug 19, 2016

[test]

@eparis eparis mentioned this pull request Aug 19, 2016
@ingvagabund
Copy link
Member

ingvagabund commented Aug 19, 2016

With respect to the origin master HEAD (e5178ec):

Only this one is not cherry-picked yet, the rest from Eric'c list is:

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to b6e8f68

@ncdc
Copy link
Contributor Author

ncdc commented Aug 19, 2016

25308 isn't a clean cherry-pick because the history command has changed after 1.3. But we probably could massage it into our code.

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8220/)

@derekwaynecarr
Copy link
Member

All the picks look good to me. The pick for 29672 is good, but I could not see what underlying issue it fixed, seemed like a change to just code hygiene since there was no reference to a material panic it prevented.

@ingvagabund
Copy link
Member

$ GOPATH=/root go test -race -cover -covermode atomic -timeout 120s github.com/openshift/origin/pkg/cmd/server/origin
I0819 14:39:24.137802 7405 :1] &{dev [wheel group-impersonater] map[]} is acting as &{system:admin [some-group] map[]}
I0819 14:39:24.138378 7405 :1] &{dev [wheel system-group-impersonater] map[]} is acting as &{system:admin [some-system:group] map[]}
I0819 14:39:24.138841 7405 :1] &{dev [wheel] map[]} is acting as &{system:admin [system:authenticated] map[]}
I0819 14:39:24.139295 7405 :1] &{dev [regular-impersonater] map[]} is acting as &{tester [system:authenticated system:authenticated:oauth] map[]}
I0819 14:39:24.141462 7405 :1] &{dev [sa-impersonater] map[]} is acting as &{system:serviceaccount:foo:default [system:serviceaccounts system:serviceaccounts:foo system:authenticated] map[]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x91fc39]

goroutine 111 [running]:
panic(0x392bca0, 0xc82000e0f0)
/usr/lib/golang/src/runtime/panic.go:481 +0x3ff
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned.(_limitRanges).List(0xc82039d220, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned/limit_ranges.go:56 +0x109
github.com/openshift/origin/pkg/image/admission.NewLimitVerifier.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3f7e318, ...)
/root/src/github.com/openshift/origin/pkg/image/admission/imagestream_limits.go:23 +0xbb
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_ListWatch).List(0xc82019ebe0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:80 +0x86
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).ListAndWatch(0xc82069c680, 0xc820058d20, 0x0, 0x0)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:258 +0x3d0
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).Run.func1()
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:202 +0x54
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc82019ec20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:84 +0x2b
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc82019ec20, 0x3b9aca00, 0x0, 0x445101, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:85 +0xca
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.Until(0xc82019ec20, 0x3b9aca00, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x51
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).Run
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:205 +0x327
FAIL github.com/openshift/origin/pkg/cmd/server/origin 0.287s

@ingvagabund
Copy link
Member

So "UPSTREAM: 29743: Fix race condition found in JitterUntil" is the culprit

@ingvagabund
Copy link
Member

Putting the runtime.HandleCrash() gets rid of the panic. Which was removed due to kubernetes/kubernetes#29743 (diff): "Remove this function. We are switching to a world where it's safe for apiserver to panic, since it will be restarted by kubelet."

Works for containerized world. So either put back the call or skip the cherry-pick as @derekwaynecarr suggests.

@timothysc
Copy link

timothysc commented Aug 19, 2016

Isn't this actually exposing a real issue though? /cc @liggitt

@timothysc
Copy link

@ingvagabund Given where we are at in the release cycle think putting it back in makes sense, but imho we need to re-eval in the next cycle issues we are actually glossing over.

@derekwaynecarr
Copy link
Member

@timothysc - I agree it could cover a real issue, I have decided to carry a patch that keeps handle crash, and we can open an issue to evaluate that carry after release.

see #10541

@derekwaynecarr
Copy link
Member

closing in favor of #10541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants