-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequential new-app commands cause possibly exponential and latent CPU consumption #5737
Comments
Sorry I don't have a better place to put this right now, but: https://drive.google.com/a/redhat.com/file/d/0BxiBh6KZZAOWQ0NMYXNuRnZjbWc/view?usp=sharing I use this script to start profiling. I then use the script in the issue description to create 300 projects. The profiler script generates the following every 30s:
It was going too slow by the time it got to project 260 so I stopped it and the last several samples have nothing talking to the master. I'm actually still collecting and the CPU is still very high but very slowly scaling down. I'll continue to collect for a while. |
Detailed analysis including tarballs containing tons of useful stats: https://docs.google.com/a/redhat.com/spreadsheets/d/1pjTbc6HQaRdybzScXy2NGwg3vIO2qHR7VOChl6-vuYY/edit?usp=sharing |
A few things we discovered:
The oc discovery combined with #5760 is enough to satisfy my main concerns for this issue, and I'll open followups once I've assessed the impact of item 3. |
cc @smarterclayton @liggitt, let's call #5760 a fix for this. |
@ironcladlou I liked the chart of requests. Any chance you ended up getting that fully automated? |
The slowdown is unexpectedly large at the scales here: #5737 (comment). Its a little better for me on my machine, but I was able to get over 10 seconds at about 300-400 projects. We should try to pin this down. It might just be a matter of generating conversions and codecs. |
I'm seeing a ton of garbage created - will look at it with your changes but On Nov 6, 2015, at 6:19 PM, David Eads [email protected] wrote: oc on the client side will get slower over repeated new-project calls as The slowdown is unexpectedly large at the scales here: #5737 (comment) — |
I want to leave this open even after we merge #5760 but will reduce the priority. |
For 1.1.1 I'd like to verify we don't have N^2 on new project creation above 1k projectd |
alloc_space after creating 200 projects
alloc_objects
|
CPU is saturated with GC |
alloc_objects at 500
alloc_objects at 700
tree FieldByNameFunc
tree meta.Accessor
|
I'm going to say this is not fixed (better, but not fixed) due to significant GC issues. We'll deal with it in 1.1.1 with algorithmic and other memory fixes. |
Regarding I'm pretty sure I can tweak the authorizer to avoid calling into |
That seems like a likely optimization target to me as well. Were you thinking of a check to see if expansion was even necessary, or a single expansion when populating the authorizer cache (or both)? |
Whether its necessary. That's easy to do without significant refactoring. The list of permissions are also relatively small, so it may be worthwhile to trade O(1) for O(n) when n < 50 compared to the GC cost. |
What do you know, already a set. I'll put it together this morning. |
FWIF, despite the fix @deads2k made for the request count inflation on the auth path, we still have no explanation yet (that I've seen) for the massively disproportionate amount of request handling/auth handling in both idle states and post project creation. |
Take a look at the cachegrind data during the creation of just 100 projects (this includes #5760 and uses curl for the project creation): https://drive.google.com/a/redhat.com/file/d/0BxiBh6KZZAOWN212UFRLWmlYN28/view The |
Can you add the pprof pngs to your script? |
I think this was resolved due to multiple changes (build controller O(N^2), authorization cache flush). Closing due to age. |
I started an etcd instance, openshift master, and openshift node on Fedora 21. I have a bash script which makes 500 new projects:
The result is CPU usage climbs and the master starts grinding down. Even long after I cut the script off at ~160 projects, master CPU consumption is still at like 60% but seems to be falling very, very gradually. I haven't yet graphed CPU consumption.
I'll attach the
openshift cli new-project
times and some pprof traces captured during and after the test duirng the weird "idle" period.The text was updated successfully, but these errors were encountered: