Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with k8s proxy #52360

Open
creack opened this issue Feb 20, 2025 · 0 comments
Open

Performance issue with k8s proxy #52360

creack opened this issue Feb 20, 2025 · 0 comments

Comments

@creack
Copy link
Member

creack commented Feb 20, 2025

Using the K8s proxy has a significant performance penalty.

Here are some benchmarks:

Context: Simple role granting everything:

Getting resources from a brand new empty kind cluster:

  • no proxy:
    kubectl get all -A 0.06s user 0.02s system 119% cpu 0.066 total
  • direct proxy:
    kubectl get all -A 0.11s user 0.06s system 7% cpu 2.466 total
  • joined proxy:
    kubectl get all -A 0.12s user 0.06s system 1% cpu 18.366 total

Creating 6 namespaces:

  • no proxy:
    kubectl apply -f foo2.yaml 0.17s user 0.04s system 35% cpu 0.582 total
  • direct proxy:
    kubectl apply -f ns.yaml 0.27s user 0.08s system 11% cpu 2.926 total
  • joined proxy:
    kubectl apply -f foo2.yaml 0.25s user 0.09s system 1% cpu 25.428 total

Creating 6 pods:

  • no proxy:
    kubectl apply -f foo3.yaml 0.18s user 0.03s system 31% cpu 0.667 total
  • direct proxy:
    kubectl apply -f foo3.yaml 0.21s user 0.11s system 8% cpu 3.873 total
  • joined proxy:
    kubectl apply -f foo3.yaml 0.26s user 0.08s system 0% cpu 37.165 total

Getting resources from a populated cluster (6ns, 200 pods per ns):

  • no proxy:
    kubectl get all -A 0.16s user 0.05s system 108% cpu 0.189 total
  • direct proxy:
    kubectl get all -A 0.29s user 0.10s system 12% cpu 3.168 total
  • joined proxy:
    kubectl get all -A 0.29s user 0.13s system 1% cpu 22.754 total

More complex role with pattern matching yields similar results, so the complexity of the role doesn't seem to be a cause.
The number of resources slightly slows down the lookup but not significantly.

After digging, the slowdown increases with the number of resources.

I suspect that the causes are among:

  • we force upgrade to http/2 pretty much everywhere
  • we don't reuse sockets
  • missing cache around permissions
  • double work checking permissions when hopping from agent to agent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants