1.3 crash picks #10526

ncdc · 2016-08-19T03:06:32Z

Cherry picks of

cc @smarterclayton @eparis @timothysc @sttts @ingvagabund @derekwaynecarr @pmorie

Protect access of the original writer. Panics if anything has wrote into the original writer or the writer is hijacked when times out.

The Pop method should allow a caller to requeue an item while under the fifo lock, to avoid races on deletes.

…ed as read-only. Send a node event when this happens and hint to the administrator about the remediation.

Previously it was trying to use a nil pod variable if error was returned from the pod update call. (note that the rest of 26680 is unit test related, and not picked here)

ncdc · 2016-08-19T03:06:40Z

[test]

ingvagabund · 2016-08-19T08:53:03Z

With respect to the origin master HEAD (e5178ec):

Only this one is not cherry-picked yet, the rest from Eric'c list is:

fix rollout nil panice issue kubernetes/kubernetes#25308 kubernetes/kubernetes@a167a6b Merge pull request etcd-storage-test: apply 1.19 diff #25308 from AdoHe/deployment_panic {simple cherry-pick, fixed panic}
fix rollout nil panice issue kubernetes/kubernetes#25308 kubernetes/kubernetes@d1480cd fix rollout nil panice issue {the same as the one above}

openshift-bot · 2016-08-19T09:53:24Z

Evaluated for origin test up to b6e8f68

ncdc · 2016-08-19T10:12:11Z

25308 isn't a clean cherry-pick because the history command has changed after 1.3. But we probably could massage it into our code.

openshift-bot · 2016-08-19T11:17:18Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/8220/)

derekwaynecarr · 2016-08-19T14:39:54Z

All the picks look good to me. The pick for 29672 is good, but I could not see what underlying issue it fixed, seemed like a change to just code hygiene since there was no reference to a material panic it prevented.

ingvagabund · 2016-08-19T14:56:39Z

$ GOPATH=/root go test -race -cover -covermode atomic -timeout 120s github.com/openshift/origin/pkg/cmd/server/origin
I0819 14:39:24.137802 7405 :1] &{dev [wheel group-impersonater] map[]} is acting as &{system:admin [some-group] map[]}
I0819 14:39:24.138378 7405 :1] &{dev [wheel system-group-impersonater] map[]} is acting as &{system:admin [some-system:group] map[]}
I0819 14:39:24.138841 7405 :1] &{dev [wheel] map[]} is acting as &{system:admin [system:authenticated] map[]}
I0819 14:39:24.139295 7405 :1] &{dev [regular-impersonater] map[]} is acting as &{tester [system:authenticated system:authenticated:oauth] map[]}
I0819 14:39:24.141462 7405 :1] &{dev [sa-impersonater] map[]} is acting as &{system:serviceaccount:foo:default [system:serviceaccounts system:serviceaccounts:foo system:authenticated] map[]}
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x91fc39]

goroutine 111 [running]:
panic(0x392bca0, 0xc82000e0f0)
/usr/lib/golang/src/runtime/panic.go:481 +0x3ff
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned.(_limitRanges).List(0xc82039d220, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/unversioned/limit_ranges.go:56 +0x109
github.com/openshift/origin/pkg/image/admission.NewLimitVerifier.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3f7e318, ...)
/root/src/github.com/openshift/origin/pkg/image/admission/imagestream_limits.go:23 +0xbb
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_ListWatch).List(0xc82019ebe0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/listwatch.go:80 +0x86
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).ListAndWatch(0xc82069c680, 0xc820058d20, 0x0, 0x0)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:258 +0x3d0
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(_Reflector).Run.func1()
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:202 +0x54
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc82019ec20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:84 +0x2b
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc82019ec20, 0x3b9aca00, 0x0, 0x445101, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:85 +0xca
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait.Until(0xc82019ec20, 0x3b9aca00, 0xc820058d20)
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/wait/wait.go:47 +0x51
created by github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache.(*Reflector).Run
/root/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/client/cache/reflector.go:205 +0x327
FAIL github.com/openshift/origin/pkg/cmd/server/origin 0.287s

ingvagabund · 2016-08-19T15:53:45Z

So "UPSTREAM: 29743: Fix race condition found in JitterUntil" is the culprit

ingvagabund · 2016-08-19T16:04:16Z

Putting the runtime.HandleCrash() gets rid of the panic. Which was removed due to kubernetes/kubernetes#29743 (diff): "Remove this function. We are switching to a world where it's safe for apiserver to panic, since it will be restarted by kubelet."

Works for containerized world. So either put back the call or skip the cherry-pick as @derekwaynecarr suggests.

timothysc · 2016-08-19T16:26:30Z

Isn't this actually exposing a real issue though? /cc @liggitt

timothysc · 2016-08-19T16:57:28Z

@ingvagabund Given where we are at in the release cycle think putting it back in makes sense, but imho we need to re-eval in the next cycle issues we are actually glossing over.

derekwaynecarr · 2016-08-19T17:12:29Z

@timothysc - I agree it could cover a real issue, I have decided to carry a patch that keeps handle crash, and we can open an issue to evaluate that carry after release.

see #10541

derekwaynecarr · 2016-08-19T17:26:19Z

closing in favor of #10541

Andy Goldstein added 9 commits August 18, 2016 22:13

UPSTREAM: 29594: apiserver: fix timeout handler

71dc34d

Protect access of the original writer. Panics if anything has wrote into the original writer or the writer is hijacked when times out.

UPSTREAM: 30291: Prevent panic in 'kubectl exec' when redirecting stdout

2660144

UPSTREAM: 29743: Fix race condition found in JitterUntil

b574fdd

UPSTREAM: 29531: fix kubectl rolling update empty file cause panic issue

19f1cfc

UPSTREAM: 29672: Add handling empty index key that may cause panic issue

7692d7c

UPSTREAM: 29581: Kubelet: Fail kubelet if cadvisor is not started.

0713ccc

UPSTREAM: 28744: Allow a FIFO client to requeue under lock

4e11e41

The Pop method should allow a caller to requeue an item while under the fifo lock, to avoid races on deletes.

UPSTREAM: 28697: Prevent kube-proxy from panicing when sysfs is mount…

cdeb7f6

…ed as read-only. Send a node event when this happens and hint to the administrator about the remediation.

UPSTREAM: 26680: Don't panic in NodeController if pod update fails

b6e8f68

Previously it was trying to use a nil pod variable if error was returned from the pod update call. (note that the rest of 26680 is unit test related, and not picked here)

eparis mentioned this pull request Aug 19, 2016

1.3 upstream picks #9790

Closed

derekwaynecarr mentioned this pull request Aug 19, 2016

1.3 crash picks + 1 carry #10541

Merged

timothysc mentioned this pull request Aug 19, 2016

Fix race condition found in JitterUntil. kubernetes/kubernetes#29743

Merged

derekwaynecarr closed this Aug 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.3 crash picks #10526

1.3 crash picks #10526

ncdc commented Aug 19, 2016

ncdc commented Aug 19, 2016

ingvagabund commented Aug 19, 2016 •

edited

Loading

openshift-bot commented Aug 19, 2016

ncdc commented Aug 19, 2016

openshift-bot commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

timothysc commented Aug 19, 2016 •

edited

Loading

timothysc commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

1.3 crash picks #10526

1.3 crash picks #10526

Conversation

ncdc commented Aug 19, 2016

ncdc commented Aug 19, 2016

ingvagabund commented Aug 19, 2016 • edited Loading

openshift-bot commented Aug 19, 2016

ncdc commented Aug 19, 2016

openshift-bot commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

ingvagabund commented Aug 19, 2016

timothysc commented Aug 19, 2016 • edited Loading

timothysc commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

derekwaynecarr commented Aug 19, 2016

ingvagabund commented Aug 19, 2016 •

edited

Loading

timothysc commented Aug 19, 2016 •

edited

Loading