-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instaslice custom metrics #353
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mamy-CS The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mamy-CS the API has changed. We should move to the new API |
I would recommend to squash commits, also please report if e2e pass in both modes |
pr is rebased and ready |
Not sure if openshift is in scope for this yet. https://rhobs-handbook.netlify.app/products/openshiftmonitoring/collecting_metrics.md/ There are some requirements for having this work with openshift monitoring. We also do not want to disable TLS as our default. As of right now this seems to work with a custom deployed promethesus but I don't think it would work with Openshift. |
@kannon92 I think we should make this work on Kubernetes (maybe KinD) and then modify as per OpenShift platform requirements if you agree |
Yes, this custom metrics work is currently on KinD, we can modify it to OpenShift Req as @asm582 mentioned. |
a503cd4
to
dd48022
Compare
Signed-off-by: MohammedAbdi <[email protected]> update metrics Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]> update metrics Signed-off-by: MohammedAbdi <[email protected]> update Signed-off-by: MohammedAbdi <[email protected]> update deployed pod total and total processed slices metrics Signed-off-by: MohammedAbdi <[email protected]> updateMetricsAllSlotsFree Signed-off-by: MohammedAbdi <[email protected]> nits Signed-off-by: MohammedAbdi <[email protected]> update promethues Signed-off-by: MohammedAbdi <[email protected]> update deployed pod total metrics call Signed-off-by: MohammedAbdi <[email protected]> remove fake capacity file Signed-off-by: MohammedAbdi <[email protected]> update profile map extraction automation Signed-off-by: MohammedAbdi <[email protected]> update Signed-off-by: MohammedAbdi <[email protected]> Track total fit across all GPUs correctly Signed-off-by: MohammedAbdi <[email protected]> add unit tests Signed-off-by: MohammedAbdi <[email protected]> update metrics url Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]> adjust unit tests Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]> update Signed-off-by: MohammedAbdi <[email protected]> update manifests Signed-off-by: MohammedAbdi <[email protected]> update test file Signed-off-by: MohammedAbdi <[email protected]> update compatible profiles Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]> address reviews Signed-off-by: MohammedAbdi <[email protected]> address comments Signed-off-by: MohammedAbdi <[email protected]> nit Signed-off-by: MohammedAbdi <[email protected]>
@mamy-CS: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Addresses - #53
Enable Kubernetes metrics from controllers
And add Custom metrics
instaslice_gpu_slices_total : Total number of GPU slices allocated per node
instaslice_current_deployed_pod_total: Pods that are deployed currently on slices
instaslice_pending_gpu_slice_requests : Number of pending GPU slice requests
instaslice_current_gpu_compatible_profiles: Profiles compatible with remaining GPU slices
instaslice_total_processed_gpu_slices : Number of total processed GPU slices
Unit tests added for metrics reconcile funcs and calls in instaslice _controller and prometheus_manager
Updated: Successful run of make test, make lint, make test-e2e, and make test-e2e-kind-emulated