-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline cluster up output #13636
Streamline cluster up output #13636
Conversation
Output when image present:
Output when having to pull image:
Old output is still available with |
@bparees @smarterclayton ptal |
lgtm but better let @smarterclayton have the final say since he instigated these changes. |
@csrwng I see a difference between a cluster up with and without pulling images but I don't see a difference from cluster start if you keep config for second boot. In that case showing information on users is superfluous and might not be correct as the user is not really logged in. I would so this info in that case. Also I would move the layers percentage in the pull to loglevel=1. |
@jorgemoralespou the reason the layers percentage is displayed is that if you have a particularly slow connection, you wouldn't see anything happening for a good while and you'd think that the command is just stuck. Would this be ok for when you're reusing existing config/data?
|
@csrwng yes, it would be great. There was also some input from the cdk tan
so adding them here.
cc/ @hferentschik @LalatenduMohanty @praveenkumar I can't lookup the issue
but you were also interested in this. Please, comment.
Regarding the log for layers. Is there a chance that it could be on a
oneliner? If not, it's fine. Not really a big issue.
|
d1ee956
to
0ebd8b7
Compare
@jorgemoralespou updated the display for when you're reusing config/data. |
@csrwng sounds good.
|
I'd prefer not to get clever on progress writer. If you have to wait,
as long as we don't go past 5-6 lines it's not really a problem.
Clever is bad.
|
Network check tends to be very slow for me - that's another spot where some output is useful. |
Nm, saw your comment |
@smarterclayton dios it mean you approve? You're the only one that matters.
|
How often does the container network test fail? Can we only run it if something else fails first? |
@smarterclayton so it likely fails when you first run 'cluster up' on a machine that doesn't have the right firewall rules set. Unfortunately, it's not something that you notice in the initial setup of things. Everything will succeed but then you either won't be able to push to the registry or your dns lookups will fail. So the issue is that you pay this premium every time you start cluster up when after the first time you run it successfully, you likely won't need to check any more. |
Something that would be nice would be to start the test asynchronously and then notify you that things are not right as you try to use openshift. But there's not a single interaction entry point, so that's hard. |
Is there a reason the test takes so long?
Putting the test at loglevel 1 means no one ever runs it. I'm just
looking for a way to have the value of the test without the large
cost.
|
I would consider these options:
- have a flag to execute the tests. In case of failure print a mage to run
with the flag.
- have a command "oc cluster diagnose" or test/validate that could run
these tests and tell you if the host is ready for rubbing oc cluster. I
would move here the checks to the insecure registry and the network (and
anything else we test)
- always running the test on new clusters but avoid the tests if you have
kept the config
|
I'd like to just make the test faster. Flag or another command won't help
new users. New clusters are annoyingly slow with it.
On Apr 6, 2017, at 10:13 AM, Jorge Morales Pou <[email protected]> wrote:
I would consider these options:
- have a flag to execute the tests. In case of failure print a mage to run
with the flag.
- have a command "oc cluster diagnose" or test/validate that could run
these tests and tell you if the host is ready for rubbing oc cluster. I
would move here the checks to the insecure registry and the network (and
anything else we test)
- always running the test on new clusters but avoid the tests if you have
kept the config
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13636 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pw54cMBZIEVFRokF-mXwt_3ppQqYks5rtKyhgaJpZM4M0Pbm>
.
|
If the tests can be made faster I'm up for it.
But I wouldn't trade off speed of creating a cluster of improvements can
not be made for the annoyance that introducing a new flag or command would
have, as users will eventually learn. And in any case, are tests really
required on a node for a second time? In my laptop, once I've set up the
insecure registry and firewall the chances I might hit that problem again
are very low. Also in a non local environment.
|
I'm investigating what's making the test slow. In theory, it should not take that long. I am hitting the master api endpoint from a container after the healthz endpoint is returning ok. The DNS server would maybe take a little longer to come up, but I wouldn't expect it to take as long as 20sec as I've seen sometimes. If for whatever reason that test can't be made faster, an alternate test could be done with a pair of containers, one using the pod network and the other one using the host network. |
0ebd8b7
to
0d63ba6
Compare
So the container networking test is much faster now that I've fixed a very embarrassing bug (the first part of the test was not working at all and only failing after 40 tries). So now it will run every time no matter what. If the firewall is setup correctly, it doesn't add any/much time to startup. |
[test] |
Could this embarrasing fix be backported to 1.5 branch?
|
@jorgemoralespou I'll submit a fix for that branch |
0d63ba6
to
40fc3ed
Compare
@jorgemoralespou actually in 1.5 it's not broken in the same way |
Changes look good to me beyond that - thanks for digging in on the network bug.
|
40fc3ed
to
ae9c5c4
Compare
integration test seems to have gotten stuck... restesting |
#12007 |
Outputs previous messages when either an error occurs during startup or loglevel > 0. Now only executes container network test if loglevel > 0 to speed up startup time.
ae9c5c4
to
55dc4ef
Compare
Evaluated for origin test up to 55dc4ef |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/646/) (Base Commit: 44d4f23) |
[merge] |
Evaluated for origin merge up to 55dc4ef |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/282/) (Base Commit: 1ea122b) (Image: devenv-rhel7_6125) |
@csrwng sorry, late to the game. +1 for improving on the output. What is the best approach to get an oc version containing this change? Any chance you are building and hosting binaries as part of a pull request build? Or do I need to rebuild latest origin master myself? |
@hferentschik this will be included in the next release for origin. In the meantime, you can build master locally. If you have a working openshift environment, this is easy to do with a template: |
Outputs previous messages when either an error occurs during startup or
loglevel > 0.
Now only executes container network test if loglevel > 0 to speed up
startup time.
Fixes #12715
#12531