-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable etcd watch cache for k8s types #8395
Conversation
[test] [extended:core] |
@@ -46,6 +46,7 @@ func NewServerRunOptions() *ServerRunOptions { | |||
InsecureBindAddress: net.ParseIP("127.0.0.1"), | |||
InsecurePort: 8080, | |||
LongRunningRequestRE: defaultLongRunningRequestRE, | |||
MaxRequestsInFlight: 400, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo we can safely up this to 600, b/c we plan to override @ scale today. Smaller installations would be unaffected.
The legacy # of 400 was chosen back in the 1.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is per master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Kube doesn't always do their options "right" so we don't get the tap
on the shoulder in the code where the new options are obviously missing.
On Wed, Apr 6, 2016 at 6:19 PM, Jordan Liggitt [email protected]
wrote:
In
Godeps/_workspace/src/k8s.io/kubernetes/pkg/genericapiserver/server_run_options.go
#8395 (comment):@@ -46,6 +46,7 @@ func NewServerRunOptions() *ServerRunOptions {
InsecureBindAddress: net.ParseIP("127.0.0.1"),
InsecurePort: 8080,
LongRunningRequestRE: defaultLongRunningRequestRE,
MaxRequestsInFlight: 400,
This is per master
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/8395/files/714b88b4308eb44f74ead7e66395fd98aa98207c#r58792210
Question out of my ignorance for the process...when we rebase kube into origin, wouldn't these things be included? |
We get the capability for k8s resources, but we drive server startup from our config, not from command line flags. We continually have to verify that new kubernetes options we want are enabled, and new kubernetes options we don't want are not enabled. |
A configdump option seems like it's going from a nice-to-have, to essential... pretty fast. |
Not sure what you mean by "config dump option" |
The main idea was to dump all the values of all the knobs. kubernetes/kubernetes#14916 |
Ah. Unfortunately, at the point where the config, defaults, and flags have been transformed into the actual options structs used to run the components, about a third of the fields are no longer serializable values, but are instantiated runtime objects. |
I am in favor of enabling this change. |
This is probably something we should turn on, considering it's a big performance win in Kube 1.2. My only hesitation is how late in the release cycle we are, but I think we're better off enabling it and keeping an eye out for weird/unexpected issues. And then make supporting the watch cache for origin resources a P0 but maybe not strictly required for 3.2. |
@derekwaynecarr thoughts on this being related to web UI scaling issues, as well? |
It's definitely related. Multiple identical watches should not all be hitting the etcd backend with this cache turned on |
Even more reason to enable it, and add support for the Origin resources too |
@abhgupta this one should be on your list. |
Totally agree, just a question of risk. I'm 65% in favor of enabling the but we have now (k8s resource watch cache) |
Let's enumerate the possible known risks? |
I'm testing by explicitly setting right now. |
I've also asked @rflorenc to re-run his webUI tests with
|
You need this PR to make the setting effective |
Why wouldn't that be enabled? It's on the APIServer struct that we would pass down? |
.... we don't use the APIServer object we create anywhere later? |
only to populate the genericapiserver config struct at https://github.com/openshift/origin/pull/8395/files#diff-05523003a782d7b3b61c2608a29dfb39R255 |
The main risk is that the watch cache behaves differently than the watch directly against etcd. Looks like one of the test runs failed with this:
which means we got an Added event on a watch started from the resource version of a newly created object. That sounds like different behavior to me. |
Just seeing this makes me say this is too risky for 1.2. We'll have to get On Thu, Apr 7, 2016 at 1:49 PM, Jordan Liggitt [email protected]
|
Could we default off, and vet for openshift? |
sure |
What is weird to me is storage is configured per-resource, and each resource sets the boundary for it's cache. |
So in digging into the reason for the error it looks like openshift is triggering on implied watch event semantics, where upstream does not. Upstream tests on the cacher clearly show semantics as Where the code in https://github.com/openshift/origin/blob/master/pkg/cmd/cli/sa/newtoken.go#L219 could probably just read as:
@liggitt are there other failures besides this one, that I'm not seeing? |
hadn't dug yet, and that failure blocked later tests from running |
without the cache, watching from resourceVersion N, the first delivered event is the first resourceVersion past N
with the cache, watching from resourceVerson N, the first delivered event is resourceVersion N.
that means the cache is not a transparent change... we'll need to fix that before we can enable it and that behavior just shipped in 1.2 upstream :( |
We definitely need to fix upstream - that's a breaking, non backwards
compatible change, and it's horrifying no one noticed.
|
opened kubernetes/kubernetes#24004 |
updated with upstream watch cache fix, rerunning tests |
@liggitt so ... with that should we re-test at this point? |
won't really affect performance numbers, it'll just let our tests and controllers that really care about exact resourceVersion starting points work correctly |
ah. ok. thank you. |
[test] On Fri, Apr 8, 2016 at 10:55 AM, OpenShift Bot [email protected]
|
Evaluated for origin test up to 3639830 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2850/) (Extended Tests: core) |
clean origin test runs. 2 failures in extended tests:
|
Those are known flakes On Apr 8, 2016, at 3:21 PM, Jordan Liggitt [email protected] wrote: clean origin test runs. 2 failures in extended tests: Summarizing 2 Failures: [Fail] deployments: parallel: test deployment test deployment [It] should [Fail] Kubectl client Update Demo [It] should scale a replication — |
Are we in the clear then? We're fixing the dep issues on the e2es upstream, btw. |
Approved Lgtm [merge]
|
[merge] now that Sams fix is in the queue
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5566/) (Image: devenv-rhel7_3952) |
Wonderful:
|
Spawned kubernetes/kubernetes#24125 [merge] |
Not new to this PR |
Evaluated for origin merge up to 3639830 |
related to #8392