-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine ID: Add Prometheus metrics for loop tasks #52410
base: master
Are you sure you want to change the base?
Conversation
This adds a number of Prometheus metrics to help track success, failure, and timing for loop iterations. The loop helper is used across tbot services, so these metrics universally cover identity and output renewals, among other tasks. Also, renames `service_heatbeat.go`, which was misspelled.
Sample of new metrics:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue here is that all of our output services use output-renewal
as their task name, so they'll be grouped together. Do we want to make that more specific? I'd suggest either appending something more specific to the name (e.g. output-renewal/application
) or adding a subtype field + prometheus label.
This adds a number of Prometheus metrics to help track success, failure, and timing for loop iterations. The loop helper is used across tbot services, so these metrics universally cover identity and output renewals, among other tasks. Also, included the Teleport build collector.
New metrics include:
tbot_task_iteration_duration_seconds
: histogram of iteration time, including all retriestbot_task_iterations_successful
: histogram of # of attempts needed for a particular iteration to succeedtbot_task_iterations_failed
: count of failures by tasktbot_task_iterations
: simple counter of iterations attempted per task, regardless of outcomeThis additionally renames
service_heatbeat.go
, which was misspelled.changelog: Machine ID: Added new Prometheus metrics to track success and failure of renewal loops