-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean Prometheus example #17992
Clean Prometheus example #17992
Conversation
Tested this in my lab and I can see alerts being generated in alertmanager and pushed to both receivers. |
@zgalor @smarterclayton Can you please review this. Its just cosmetic name changes and adding a proxy for the alert manager so it could be used in "Enterprise" clients now as it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/retest |
examples/prometheus/prometheus.yaml
Outdated
namespace: "${NAMESPACE}" | ||
data: | ||
alertmanager.yml: | | ||
global: | ||
|
||
#smtp mail configuration for mail alerts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, so please don't put it in the example. The alert manager should work out of the box, even if it does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw we need to set this up in api.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis point me where, and I can do this.
examples/prometheus/prometheus.yaml
Outdated
receivers: | ||
- name: alert-buffer-wh | ||
webhook_configs: | ||
- url: http://localhost:9099/topics/alerts | ||
- name: mail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, this won't work out of the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am. It did when I tested. I can retest and confirm. Might did it wrong somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ou. Got you, sr. Will remov
examples/prometheus/prometheus.yaml
Outdated
|
||
restartPolicy: Always | ||
volumes: | ||
#prometheus mounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these, use newlines to separate the sections if necessary.
examples/prometheus/prometheus.yaml
Outdated
secretName: alerts-proxy | ||
- name: alerts-tls | ||
secretName: prometheus-alertmanager-tls | ||
- name: prometheus-alertmanager-proxy-secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the prometheus- prefix on these - prometheus-alertmanager-proxy
is redundant, alertmanager-proxy
is sufficient
You need to run hack/update-generated-bindata.sh |
796272a
to
83b0ea9
Compare
83b0ea9
to
c1563d1
Compare
Add proxy for alertmanager
c1563d1
to
45eb81a
Compare
@smarterclayton done.
Now only Prometheus has prefix Prometheus. Should we start looking into splitting this to use |
This looks pretty good, moderate concern about how this would break when someone tried to upgrade. Agree it's a cleaner setup |
@smarterclayton as much I know we don't have upgrade pattern yet? + as technology preview we are not deploying this actively to the customers so I dont think that were is a big need to seamless upgrade path (considering we still missing HA) I think next pending changes should be (in priority order):
So changes will need to be done either way, and I think faster we can do them, less impact we will have on future deployments. |
There is an HA design doc Simon is proposing, we are debating whether we actually want to split right now. I think I'm ok with updating this, it'll just be more work in the short term to fix the credentials in online. |
@openshift/sig-networking testing panic
|
@smarterclayton it looks like there were 0 schedulable nodes at that point, which also caused a bunch of other failures, but that particular test only checks for
and assumes that if there isn't 1 node, there must be more than 1. We can fix that, but anyway the real problem here is that there were no functioning nodes at that point for some reason. |
I don't think you have to check for that, that is indeed a totally broken cluster. Hrm. Bootstrapping could have regressed, or have a flake. |
@aweiteka can you look over it |
Anything I can help with this? Got in touch with Simon, so will start looking to his HA design and what we need to change. Whats the plan for Prometheus GA? 3.9? |
/retest |
/lgtm |
Thanks @smarterclayton . Will change ansible now too. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aweiteka, mjudeikis, smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 17992, 18091, 18118). |
@mjudeikis: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Main changes:
If/When this gets merged will update ansible deployment too to represent this change.
Did this for client deployment for their monitoring solution evaluation.