-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Prometheus to get metrics from the router #19318
Allow Prometheus to get metrics from the router #19318
Conversation
@smarterclayton FYI |
All routers should be scraped, the instance and namespace are enough to disambiguate. We should have a role already for an endpoint that can do SAR, it should be a system role from upstream |
Since this worked before, I wonder what changed. Was working ootb when I deployed before the rebase. @liggitt anything maybe get lost with roles? |
Don't think so… we have unit tests that make visible any changes in bootstrap permissions, and nothing related to this showed up |
Does it make sense to extend the Prometheus e2e tests to check that router's metrics are collected then?
If you're talking about multiple replicas of the default origin/examples/prometheus/prometheus.yaml Lines 717 to 725 in 5d07751
Since I'm not familiar with routers, it might be a non-concern.
Do you mean |
02328c9
to
5ad9ba5
Compare
In my tests, I'm using |
5ad9ba5
to
6a90871
Compare
@smarterclayton @liggitt I'd appreciate your guidance regarding this PR. Is it going in the right direction? Or did I miss something? |
Why the heck is cluster-reader being given to the router (Its probably due to an old hack @openshift/networking)?
We already have an e2e test that verifies the production of them but not the scrape. I think yes, but it doesn't have to block this.
Yes.
Ignore for now. |
@smarterclayton I've updated the PR to bind the |
/retest |
/retest |
1 similar comment
/retest |
61d68da
to
cf7804e
Compare
Instead of adding more rules to the system:router role, this change reuses the existing system:auth-delegator role.
This reverts commit 5d7f483.
IIUC the router's service account is assigned the The Now when I assign the
The router's logs tell that it is an authorization problem but I fail to understand how switching from a single
|
/retest flake #19058 |
@smarterclayton can you please take a look at this? for reference #19318 (comment) |
/lgtm Sorry for the delay |
@smarterclayton anything else required to move the PR forward? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: simonpasquier, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/retest flake #19058 |
/test gcp |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
I think this is incomplete. On openshift enterprise 3.10 prometheus can not reach the router metrics because it's blocked by firewalld. Routers are not opening the port 1936 on their host systems. |
@Klaas- the firewall issue is tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1552235 |
@simonpasquier thanks, I'll follow that bz :) |
Fix for #17685. Without this PR, the router can't validate the Prometheus token because it lacks the following permission:
It also removes the
prometheus.io/...
andprometheus.openshift.io/...
annotations on the router's service since they are unused and create targets that Prometheus can't scrape.I'm not clear on what happens when several routers are deployed. Most likely only the first one associated to the
router
service will be scraped but not the others.