Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary source build timeout after 5 minutes in case of an error #9923

Closed
rhuss opened this issue Jul 19, 2016 · 6 comments · Fixed by #9945
Closed

Binary source build timeout after 5 minutes in case of an error #9923

rhuss opened this issue Jul 19, 2016 · 6 comments · Fixed by #9945

Comments

@rhuss
Copy link
Contributor

rhuss commented Jul 19, 2016

When starting a Docker build with a binary source with oc start-build and the build has an error like an invalid imagestream name, then oc start-build blocks forever without ever returning.

Version
$ oc version
oc v1.3.0-alpha.2
kubernetes v1.3.0-alpha.1-331-g0522e63

$ openshift version
openshift v1.3.0-alpha.2
kubernetes v1.3.0-alpha.1-331-g0522e63
etcd 2.3.0
Steps To Reproduce
  • Create a BuildConfig and an ImageStream:
---
apiVersion: "v1"
items:
- apiVersion: "v1"
  kind: "ImageStream"
  metadata:
    name: "spring-boot-web"
- apiVersion: "v1"
  kind: "BuildConfig"
  metadata:
    name: "spring-boot-web-build"
  spec:
    output:
      to:
        kind: "ImageStreamTag"
        name: "fabric8/spring-boot-web:1.0.1"
    source:
      binary: {}
    strategy:
      dockerStrategy: {}
kind: "List"

Note that the output image stream name is invalid because of the contained /

  • Start a build with a binary source
cat docker.tar | oc start-build --from-dir=- spring-boot-web-build
  • In an other terminal verify that the build has an error:
$ oc describe build

Name:       spring-boot-web-build-1
Created:    7 minutes ago
Labels:     buildconfig=spring-boot-web-build,openshift.io/build-config.name=spring-boot-web-build,openshift.io/build.start-policy=Serial
Annotations:    openshift.io/build-config.name=spring-boot-web-build
        openshift.io/build.number=1
Build Config:   spring-boot-web-build
Duration:   waiting for 7m10s
Build Pod:  spring-boot-web-build-1-build
Strategy:   Docker
Binary:     provided on build
Output to:  ImageStreamTag fabric8/spring-boot-web:snapshot-160719-084528-0494
Status:     New (The referenced output image stream default/fabric8/spring-boot-web could not be found by build default/spring-boot-web-build-1: invalid resource name "fabric8/spring-boot-web": name may not contain "/".)
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason      Message
  --------- --------    -----   ----            -------------   --------    ------      -------
  7m        7m      1   {build-controller }         Warning     HandleBuildError    Build has error: the referenced output image stream default/fabric8/spring-boot-web could not be found by build default/spring-boot-web-build-1: invalid resource name "fabric8/spring-boot-web": name may not contain "/"
Current Result

The original oc start-build times out after 5 minutes

cat docker.tar | oc start-build --from-dir=- spring-boot-web-build
Uploading archive file from STDIN as binary input for the build ...
Error from server: Timeout: timed out waiting for build spring-boot-web-build-1 to start after 5m0s

If there is is no error in the build config and the build succeeds, then this command returns within seconds.

Expected Result

An error should be printed immediately and the command should be aborted. I suspect that the HTTP connection over which the tar is sent is not closed properly after an error ocured on the server side.

@rhuss
Copy link
Contributor Author

rhuss commented Jul 19, 2016

The very same happens when going directly via REST to the API (as we do for the fabric8-maven-plugin), so it is very likely a server issue, not a client issue.

@bparees
Copy link
Contributor

bparees commented Jul 19, 2016

This is basically working as designed. For any build if the output imagestream isn't available, we'll retry for up to 30 minutes waiting or the output imagestream to appear. This is necessary particularly in cases where you create multiple objects simultaneously (eg you create the output imagestream and the build at the same time) since they might appear in any order, so we retry to see if it appears.

If you had created the output imagestream within the 5 minute binary build timeout, I think you would have seen a successful build. As is it, the binary build times out sooner than the 30 minute "waiting for the imagestream" timeout, which is why you saw the behavior you did.

Can you attempt to create the imagestream after creating the binary build and confirm that the build does end up succeeding in that case?

@rhuss
Copy link
Contributor Author

rhuss commented Jul 19, 2016

My point is that the BuildConfig itself was invalid as it references an output ImageStreamTag with an invalid name (fabric8/spring-boot-web:1.0.1 contains a slash). So I don't think that the build can succeed even when the ImageStream comes late. (see also the error message of oc describe build above).

In that cases where a build never can succeed isn't it possible to fail early ? But maybe I do miss something here ...

@rhuss
Copy link
Contributor Author

rhuss commented Jul 19, 2016

Maybe in this case it could be sufficient to throw an error when the BuildConfig is created with an invalid output ImageStreamTag ?

On the other hand, a lot of errors in OpenShift are reported asynchronously (like when an image couldn't be pulled), why not do it here in the same way ? I.e. returning from the oc start-build call as soon as the binary is uploaded, and reporting the error in the logs when the imagestream doesn't kick in.

@bparees
Copy link
Contributor

bparees commented Jul 19, 2016

@rhuss ok the actual issue here is that we do not validate the ImageStreamTag name you provided when you created the buildconfig, so as you say, we should be throwing the error when you create the buildconfig because the buildconfig is not technically valid.

doing it asynchronously doesn't make sense in this case since that imagestreamtag name will never be valid/resolveable.

@rhuss
Copy link
Contributor Author

rhuss commented Jul 20, 2016

Agreed, would be fine for me with an early validation.

With being async I don't only meant this specific case, but more as a general note. Even waiting on an imagestream synchronously in general is a bit contradictory to the way how e.g. OpenShift / Kubernetes waits on the end of an image pull for a pod to start. But I'm fine with that, just wanted to point out the difference in UX here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants