Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log troubleshooting info when console install fails #7132

Merged

Conversation

spadgett
Copy link
Member

@spadgett spadgett commented Feb 13, 2018

Show the results of the following commands for troubleshooting console
errors when install fails:

  • oc status -n openshift-web-console
  • oc get pods -n openshift-web-console -o wide
  • oc get events -n openshift-web-console
  • oc logs deployment/webconsole --tail=50 -n openshift-web-console

/assign @sdodson
@jwforres FYI

@spadgett
Copy link
Member Author

@sdodson Opinion on this change?

# Ignore errors so we can log troubleshooting info on failures.
ignore_errors: yes

# Log the reuslt of `oc status`, `oc get pods`, and `oc logs deployment/webconsole` for troubleshooting failures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: comment typo

ignore_errors: true
- debug:
msg: "{{ console_pods.stdout_lines }}"
- name: Get console pod logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pods might not be running if the image cannot be found, could oc get ev be added to show latest events?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could oc get ev be added to show latest events?

Sure.

Show the results of the following commands for troubleshooting console
errors when install fails:

* `oc status -n openshift-web-console`
* `oc get pods -n openshift-web-console -o wide`
* `oc get events -n openshift-web-console
* `oc logs deployment/webconsole --tail=50 -n openshift-web-console`
@spadgett spadgett force-pushed the console-troubleshooting branch from 38883bf to d0263a5 Compare February 19, 2018 13:49
@spadgett
Copy link
Member Author

@vrutkovs Thanks, updated.

@spadgett
Copy link
Member Author

spadgett commented Feb 19, 2018

FWIW, it's possible to see image pull errors even without oc get events. oc get pods will show the error.

TASK [openshift_web_console : debug] ***************************************************************************************************
ok: [...] => {
    "msg": [
        "NAME                          READY     STATUS             RESTARTS   AGE       IP            NODE",
        "webconsole-7f754f749b-2xpnb   0/1       ImagePullBackOff   0          6m [...]"
    ]
}

I don't think it's bad to list events anyway, though.

@sdodson
Copy link
Member

sdodson commented Feb 19, 2018

How do we feel about continuing the install/upgrade on failure and then reporting the problem at the end for 'leaf' components where no other components depend on the successful deployment of that component?

@ewolinetz @mtnbikenc The stats + callback implementation would facilitate that, would it not? Do we have an example for Sam to reference?

@ewolinetz
Copy link
Contributor

Yeah, we can use something like the following:

- set_stats:
    data:
      installer_phase_web_console:
        message: "The web console failed to install, yadda yadda."

@spadgett
Copy link
Member Author

This is what it looks like with @ewolinetz's suggestion:

INSTALLER STATUS ***********************************************************************************************************************
Initialization             : Complete (0:00:29)
Health Check               : Complete (0:00:19)
etcd Install               : Complete (0:01:02)
Master Install             : Complete (0:03:26)
Master Additional Install  : Complete (0:00:46)
Node Install               : Complete (0:01:58)
Hosted Install             : Complete (0:01:24)
Web Console Install        : Complete (0:07:38)
        The web console failed to install.

@spadgett
Copy link
Member Author

I can add some more detail to the message if we're good with that.

With color:

screen shot 2018-02-19 at 3 06 51 pm

@spadgett
Copy link
Member Author

@sdodson The status code returns 0 when I use set_stats

@sdodson
Copy link
Member

sdodson commented Feb 20, 2018

That's unfortunate, I thought ignore_errors still left the task in a failed state triggering non-zero exit code. @mtnbikenc is this as you'd expect?

@sdodson
Copy link
Member

sdodson commented Feb 20, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2018
@sdodson
Copy link
Member

sdodson commented Feb 20, 2018

/lgtm cancel

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2018
@spadgett spadgett force-pushed the console-troubleshooting branch from 1d236e6 to d0263a5 Compare February 20, 2018 19:03
@spadgett
Copy link
Member Author

@sdodson I added the fail back in per our conversation.

@sdodson
Copy link
Member

sdodson commented Feb 20, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2018
@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@spadgett
Copy link
Member Author

@sdodson BTW this is a more complete screenshot of what I saw using set_stats instead of fail:

screen shot 2018-02-20 at 8 56 27 am

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue.

@openshift-merge-robot openshift-merge-robot merged commit c6f2004 into openshift:master Feb 21, 2018
@openshift-ci-robot
Copy link

@spadgett: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_install_crio d0263a5 link /test crio

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants