-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bump(*): update etcd to 3.2.16 and grpc to 1.7.5 #18731
Conversation
Upstream kubernetes/kubernetes#60299 |
f350412
to
736404c
Compare
@deads2k |
@juanvallejo this is highlighting shortcomings in the dep set detection. bolt really shouldn't be considered "ours". We'll never pin it. We have to think of a different rule. @sttts thanks for the list |
still timing out:
|
Which go version do we use? 1.9? I saw a panic in 1.10 deep inside etcd with the consequence of the test to time out. |
I also dropped @mfojtik's wip timeout commit. So this might have an influence here, independently from my commit. |
yes, go 1.9 |
sure, though before this PR, that package took ~9 seconds, and with this PR, it takes ~450 (timing out at 120s by default) |
Ouch, has anybody looked into the reason? Now I understand @mfojtik's 600 in that commit. |
got as far as isolating which tests jumped in time, hadn't dug beyond that - #18660 (comment) |
I cannot reproduce the long runtimes locally. What I found in
|
/retest |
Running a test locally in an infinite loop. Maybe this early termination is just a flake. |
does not help. |
bb11a99
to
420502c
Compare
@sttts I swear I saw a comment in kubernetes claiming that all these etcd messages are "log spam". |
Latest finding: the termination of the cluster is the normal Continueing digging. |
This blocks during termination:
|
/retest |
@sttts the |
@mfojtik something is blocked on shutdown definitely, leading to those messages. But the messages themselves are not the issue. |
update: I added
That means the cause is not Terminate and this need more debugging :-( |
@sttts +1 the messages seems to be just log spam, but what is interesting is that I would expect these messages to show the status of the connection as "shutdown" and not "connecting"... IOW. when we terminate the etcd server, we also go terminate all connections and etcd server should terminate all connections to grpc... In code that means the connections status is updated to "shutdown" and we should not see that log messages... The fact we see them might indicate that something is not right in the termination code. However, even if we make that termination non-blocking in defer, the time improves barely ~10s which also indicates that the problem might be somewhere else. |
It appears to be the |
Try pulling in kubernetes/kubernetes#60430 |
@deads2k tested, verified this make this test to run in 87s vs. 167s before. |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mfojtik, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dc67aaa
to
373bdee
Compare
all tests are green /lgtm |
@smarterclayton FYI |
/test all [submit-queue is verifying that this PR is safe to merge] |
/retest |
1 similar comment
/retest |
that which is not closed can eternal flake |
/retest |
@sttts: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Replaces #18660 by fixing an upstream bug with the testing etcd.
Copied from #18660:
Update of etcd level from 3.2.8 to 3.2.16 and gRPC to 1.7.5 (matching etcd version).
Fixes: #18496
etcd:
grpc:
List of interesting changes or changes related to gRPC:
3.2.10: etcd-io/etcd@6d40628: update grpc, grpc-gateway (1.4.2 -> 1.7.3)
3.2.10: etcd-io/etcd@a8c84ff: clientv3: fix client balancer with gRPC v1.7
3.2.10: etcd-io/etcd@939337f: add max requests bytes, keepalive to server, blackhole methods to integration
3.2.10: etcd-io/etcd@8de0c04: Switch from boltdb v1.3.0 to coreos/bbolt v1.3.1-coreos.3 (<- concerning?)
3.2.11: etcd-io/etcd@5921b2c: log grpc stream send/recv errors in server-side
3.2.11: etcd-io/etcd@ff1f08c: upgrade grpc/grpc-go to v1.7.4
3.2.12: etcd-io/etcd@e82f055: clientv3: configure gRPC message limits in Config
3.2.12: etcd-io/etcd@c67e6d5: clientv3: call KV/Txn APIs with default gRPC call options
3.2.12: etcd-io/etcd@348b25f: clientv3: call other APIs with default gRPC call options
3.2.13: etcd-io/etcd@288ef7d: embed: fix gRPC server panic on GracefulStop
3.2.16: etcd-io/etcd@e08abbe: mvcc: restore unsynced watchers
@smarterclayton @deads2k @liggitt maybe too late in 3.9 cycle, but I don't see any huge risk change and this is a minor version bump that contain plenty of bug fixes.
This was clean bump, no build errors or panics during server start. There were 0 picks/carries on grpc or etcd.
@deads2k I wonder if we need to add grpc into our glide.yaml... If I don't and just bump the etcd, there are no changes in grpc, just etcd. I was worried that whatever higher level client we have that use grpc will use different version than etcd?