-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/tmp/deprovision.sh failed #13559
Comments
@smarterclayton when was the jUnit + ansible output stuff supposed to bubble up to the image being used? We don't have this logged. |
Uh... |
Looks like internal GCE error, not sure if we're able to do with it. |
I opened an issue and they couldn't figure out how to help me.
…On Mon, Jun 5, 2017 at 9:28 AM, Maciej Szulik ***@***.***> wrote:
Deleted [https://www.googleapis.com/compute/v1/projects/openshift-gce-devel-ci/global/instanceTemplates/ci-prtest-5a37c28-2729-instance-template-master].
ERROR: (gcloud.compute.firewall-rules.delete) Some requests did not succeed:
- Internal Error
Looks like internal GCE error, not sure if we're able to do with it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13559 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pzdxuYV8dinqFAKN5ejDOfbHU144ks5sBAJjgaJpZM4Mru_Y>
.
|
Not cool :/ |
@stevekuznetsov @smarterclayton would it hurt if we ignore errors at the de-provision stage, I mean remove |
Today all shell tasks are generated from the template at |
@stevekuznetsov on a second thought I started wondering. If this is supposed to fix the flake, we need to enable it by default and only disable on demand. With this option being turned off we don't get rid off the flake. Why do you think this should be off by default? |
The |
The flake is we leave things uncleaned up in gce, why don't you improve
deprovision to retry certain operations? Or just rerun if a failure
happens?
On Jun 20, 2017, at 3:37 PM, Steve Kuznetsov <[email protected]> wrote:
The named_shell_task handles all shell tasks with a name in those jobs --
so that is provisioning, building, testing, etc -- it's only the
deprovisioning and cleanup tasks where we want to be able to optionally
disable the -o errexit flag to allow those steps to fail silently while
doing best-effort work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13559 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pxgfRTrPo4ov_nvq8M_2zW-SEBgTks5sGB9egaJpZM4Mru_Y>
.
|
I think I'll go with that, although I'm not 100% sure which step failed, I don't have logs anymore :( I'll dig more. |
I'm not sure retry is always appropriate -- for instance, in many deprovision/cleanup stages we fail to grab something from the remote host for whatever reason, we can live without that artifact, but we'd rather successfully continue on to actually deprovisioning the machine. Doing a re-try loop might be simpler (and require only job stage edits) but the |
I was originally thinking about a retry logic for that particular use-case. Adding that option you've mentioned is not a problem, either. |
The retry should be inside of deprovision and origin-gce, not in the job.
The playbook deprovision already has to be idempotent, and if some cloud
providers have bugs, that's just the way of the world.
…On Wed, Jun 21, 2017 at 11:30 AM, Maciej Szulik ***@***.***> wrote:
I was originally thinking about a retry logic for that particular
use-case. Adding that option you've mentioned is not a problem, either.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13559 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pxWahvp29R_23LHUq8XtfQwWTU-Vks5sGTcCgaJpZM4Mru_Y>
.
|
None of this should be things the job is aware of.
On Wed, Jun 21, 2017 at 12:18 PM, Clayton Coleman <[email protected]>
wrote:
… The retry should be inside of deprovision and origin-gce, not in the job.
The playbook deprovision already has to be idempotent, and if some cloud
providers have bugs, that's just the way of the world.
On Wed, Jun 21, 2017 at 11:30 AM, Maciej Szulik ***@***.***>
wrote:
> I was originally thinking about a retry logic for that particular
> use-case. Adding that option you've mentioned is not a problem, either.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#13559 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_pxWahvp29R_23LHUq8XtfQwWTU-Vks5sGTcCgaJpZM4Mru_Y>
> .
>
|
OK, sounds reasonable. The jobs would be better off if we could turn off early exit on error, as that makes our deprovision safer -- today if a pre-deprovision step like grabbing logs fails, we don't run deprovision as we could not describe the "always run this step even on failure" in Jenkins. The |
@mfojtik !? |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
This has actually been fixed by doing a trap to make those stages never fail. /close |
Seen in https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_gce/635/console. The direct error seems to be related with this particular GKE failure:
The text was updated successfully, but these errors were encountered: