-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to avoid REST API calls at log level 2. #40933
Fix to avoid REST API calls at log level 2. #40933
Conversation
was this a change since 1.4? |
@smarterclayton looks like you originally picked v(2) way back in 4e56daf Who would need to approve moving to V(3) |
@lavalamp because we talked about this being an operational tool for normal function, and I suspect people rely on it in GKE and others at v2 today to get logging. Changing to v3 would break anyone who scrapes at v2. That said, I agree this is very verbose. |
on a cluster of any size doing work it seems to be swampingly verbose :) whereas the other v(2) stuff isn't so bad... |
My gut would be that it should be easy to opt out of this and that most people most of the time don't need this (which is what 3 was for). But because it has operational impact... |
cc @wojtek-t I'm leaning against this PR. Those are probably most useful logs in the whole system - I look at them pretty much every time I'm debugging anything. I agree that they are verbose, but they're not for direct human consumption, not enhanced by the power for grep. @aveshagarwal - why do you want to make this change? Is the size of logs problem? |
@gmarek I agree these are useful logs, but the question is do we want these logs at --v=2? |
I think that we do. Logs are there to help figure out what's wrong, and those logs are extremely important for that. If someone want no-logging (aka 'I don't care') there's always V=0. |
v(2) is defined as "every user wants them". If we can make a case that
every kube admin wants these logs by default, v=2 is appropriate.
…On Mon, Feb 6, 2017 at 10:51 AM, Marek Grabowski ***@***.***> wrote:
I think that we do. Logs are there to help figure out what's wrong, and
those logs are extremely important for that. If someone want no-logging
(aka 'I don't care') there's always V=0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40933 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p0q3YhuPvpKn9Q-tn0NYzehitjMjks5rZ0FugaJpZM4L2nwV>
.
|
I do think that 'every (sane) user wants them', but that's not really a formal definition:) |
In other words - I think we should close this PR. |
Level 2 usually provides enough info for devs to start debug process. Having that level sane really helps when difficult to repro problems are encountered. |
We've caught a fair number of panics, crashes, bad behaviors (eaten
errors), and otherwise in test suites by simple observation of the log.
Agree it's not quite the level of volume I would expect at V(2), but it's
still possible to tune it down with --vmodule.
…On Wed, Feb 8, 2017 at 1:37 PM, Mike Fiedler ***@***.***> wrote:
Level 2 usually provides enough info for devs to start debug process.
Having that level sane really helps when difficult to repro problems are
encountered.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40933 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pyfbbzR2zyaCaRc1OrXriOgOBgdtks5ragtigaJpZM4L2nwV>
.
|
I'm closing this PR. Ping me if you disagree. |
@gmarek we have been discussing this again today. I agree with you that these logs are useful when debugging issues. However, they have a negative performance impact when running large clusters. Would you think it's reasonable to proceed with this, with the expectation that for production clusters, admins would increase the loglevel from 2 to 3 or 4 if they need to debug issues? Otherwise, 2 would be an expected default for informational log messages but not spammy ones. Also, for PR jobs, we should be running at 3 or 4 if we're not already. WDYT? |
No, level 3 is already too verbose. I understand the performance implications, but maybe we could get past them by dealing with Also, this is just my (admittedly very strong) opinion, so if enough important people agree that we should do that, then I won't be able to block this (though honestly - I'm pretty convinced that it'll cost you more debugging time, that you'll save on CPU). Can't you run cluster with |
@gmarek if we run the cluster at My suggestion is that we move this to 3, run clusters at 2, and if there is an issue that needs debugging, you restart the components at 3 and try to reproduce the issue, if that's acceptable. |
The thing is that you never know when issue will surface, and when to do this. |
For the most part I feel like v(2) has informational changes about a cluster that an admin cares about. v(3) has engineering debug stuff. This is CLEARLY engineering debug stuff. Practically no admin wants to be spammed with these. Real life example on a small cluster (7 nodes with 190 pods) we see over 82,000 messages per hour from this. And less than 4000 messages per hour for not this. This one spam accounts for x20 more messages than the rest of V(2) put together. |
@gmarek it's a balance, and it's one that's hard to win :-) You either spam the cluster with logs that you don't need 99% of the time, or you don't and it's harder to debug the past. |
This appears to be somewhat active. Removing the lifecycle/frozen state. /remove-lifecycle frozen |
audit lets you select some resources and not others, and allows for async/batch processing to avoid bottlenecks I'm in favor of this change |
/refresh |
/test all |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aveshagarwal, brendandburns, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
/retest Review the full test history for this PR. Silence the bot with an |
/test all [submit-queue is verifying that this PR is safe to merge] |
@aveshagarwal: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 40933, 62188). If you want to cherry-pick this change to another branch, please follow the instructions here. |
....I was writing a comment, but the bot merged this, apparently on a phantom approval. I think changing this deserved publication on sigs for people who run clusters. [edit: assuming this didn't happen already] |
@shyamjvs @kubernetes/sig-scalability-bugs I don't think we should have merged that - this will significantly affect our debugging ability (at least in the context of performance tests, where we can't really run tests on higher verbosity, since the amount of logs there is overloading apiserver). So I think we may consider reverting it, and making verbosity for those logs a flag (I'm fine with having a default still at 3, but I would like to be able to modify that for the purpose of scalability tests). |
I've opened #62468 to not forget about it. |
Sorry, I missed this earlier. Definitely agree with @wojtek-t on how this can greatly hurt debuggability, specially wrt performance/scalability issues. |
I think Andy is right and this PR is right, -vmodule can make these logs available but cannot be used to hide them. vmodule only goes in 1 direction. |
I've sent a PR for enabling the logs through -vmodule for tests. |
By default, logging REST API calls at log level 2 is too verbose (see below), if log level 2 happens to be default. So increasing its default to 3.
xref: https://bugzilla.redhat.com/show_bug.cgi?id=1414813
@kubernetes/rh-cluster-infra
Fix #47916