-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdn: make pod operation metrics more useful and collectable #17250
sdn: make pod operation metrics more useful and collectable #17250
Conversation
LGTM, but while double-checking the
should be |
LGTM |
e68908c
to
cca6de3
Compare
@DirectXMan12 might be interesting to you too; the problem we had with NaN was the breakdown of metrics by labels; see the first comment for more info. |
@danwinship fixed in a separate commit here |
LGTM... holding for 3.9 |
7fafabf
to
9f031d3
Compare
/test cmd |
/test cmd issue #16317 |
The pod operation error metrics were in the wrong place to capture the overall pod setup/teardown operation. Move them to capture everything. Next, the labels of the Latency metric meant that every observation was a unique metric and no statistics could be determined from them in aggregate. Change that (and pod errors) to follow the Kubelet dockershim DockerOperations[Latency|Errors] metric pattern with a label for the operation instead of the sandbox.
9f031d3
to
caf5f73
Compare
/test unit |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, knobunc The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue. |
@smarterclayton It is supposed to be fixed here, can you remove the drop for these metrics in https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml#L556 |
We don’t use that anymore for real installs. Changes need to be made to
cluster monitoring operator.
On Aug 28, 2018, at 4:03 AM, Akram Ben Aissi <[email protected]> wrote:
@smarterclayton <https://github.com/smarterclayton> It is supposed to be
fixed here, can you remove the drop for these metrics in
https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml#L556
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#17250 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p6LV84H7cLRrIUbxngMIJuc6Yppzks5uVPlggaJpZM4QYhgk>
.
|
The pod operation error metrics were in the wrong place to capture the
overall pod setup/teardown operation. Move them to capture everything.
Next, the labels of the Latency metric meant that every observation was
a unique metric and no statistics could be determined from them in
aggregate. Change that (and pod errors) to follow the Kubelet dockershim
DockerOperations[Latency|Errors] metric pattern with a label for the
operation instead of the sandbox.
Fixes: #17494
@danwinship @eparis @openshift/networking @knobunc