Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running argocd sync app --dry-run commands affect metrics #21899

Open
3 tasks done
jsolana opened this issue Feb 18, 2025 · 0 comments
Open
3 tasks done

Running argocd sync app --dry-run commands affect metrics #21899

jsolana opened this issue Feb 18, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@jsolana
Copy link
Contributor

jsolana commented Feb 18, 2025

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Related to #21661

Currently, executing the command argocd app sync --dry-run affects both the application’s state and the internal metrics exposed by ArgoCD (eg: argocd_app_info).

The main issue is that if there are alerts based on these metrics, and the dry-run execution identifies an error (e.g., a change that violates a Kyverno policy or an invalid CRD schema), the application state changes to SyncErr. This also updates the metrics, which can potentially trigger alerts based on these metrics.

For example:

# alert manager alert definition
- alert: ArgoCDApplicationUnknown
  expr: sum by (cluster) (argocd_app_info{sync_status="Unknown"}) > 0
  for: 15m
...

Since the definition of a dry-run is to execute requests without persistence, I wonder if it makes sense to handle it in a way that ensures no changes are made.

Alternatively, adding a dry_run label to differentiate operational requests from dry-run requests could also be an option (a similar change has been proposed for kyverno link).

To Reproduce

Run argocd app sync --dry-run command.

Expected behavior

There are different proposals:

  1. Add a dryrun label to distinguish dryrun activity from real ones.
  2. Ignore dryrun executions (not affecting metrics).

Initially I fond of the 1 because dryrun has a cost associated in terms of resources consumption / performance. Ignoring activity related to dryrun make extremely hard to identify the reason of performance issues.

Version

It is affecting whatever version because currently dryrun executions are not distinguished.

@jsolana jsolana added the bug Something isn't working label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant