Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-stage dockerbuilds via imagebuilder #18741

Merged
merged 5 commits into from
Mar 9, 2018

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Feb 25, 2018

Needs an e2e test and a way to perform replacement on the stage images.

Will allow standard multi-stage Dockerfile builds against any Docker 1.12+ backend.

API PR openshift/api#34

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2018
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 25, 2018
@openshift-merge-robot openshift-merge-robot added the vendor-update Touching vendor dir or related files label Feb 25, 2018
Copy link
Contributor

@bparees bparees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior of manageDockerFile w/ respect to this? I believe manageDockerFile replaces the "last" FROM statement in the dockerfile, with the resolved imagestreamtag reference. That's because at the time, multistage builds weren't a thing and if you had multiple FROM statements in your dockerfile, only the last one mattered. But if we actually expect multiple FROMs in a dockerfile now, the behavior of manageDockerFile needs to change (though it's not clear which FROM should be replaced in that case? Perhaps neither?)

}
}
}
return images.List(), len(stages) > 0, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be greater than 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 25, 2018

what's the behavior of manageDockerFile w/ respect to this? I believe manageDockerFile replaces the "last" FROM statement in the dockerfile, with the resolved imagestreamtag reference. That's because at the time, multistage builds weren't a thing and if you had multiple FROM statements in your dockerfile, only the last one mattered. But if we actually expect multiple FROMs in a dockerfile now, the behavior of manageDockerFile needs to change (though it's not clear which FROM should be replaced in that case? Perhaps neither?)

With multi-stage builds, image source suddenly gets a lot more powerful / direct, because you can bypass needing to copy the image content around.

I'm thinking that we want to add an as []string to each image source that is used instead of paths, and treat the build from field as the final stage default, but let image source override it. The question is whether a stage inside a build should be allowed to override the from outside of it or not. Advantage to overriding stages is that you can use openshift to break up a monolithic dockerfile into chunks without changing the file. Advantage to not overriding stages is avoiding the complexity of sorting out whether skipping a dependency (stage 2 depends on stage 1 which includes things in stage 0, but if you substitute stage 1 you don't get the correct stage 0 dependencies).

Current thinking:

images:
- from: istag foo
  paths: null
  as:
  - "foo:latest"  # string values to replace 
dockerStrategy:
  from: istag bar
  dockerFile: |
    FROM foo:latest # this is implicitly stage 0, get replaces by istag foo
    COPY --from=random:v1 # pulls directly, no replacement
    ...
    FROM baz # would be result istag bar, unless an image source existed targeting "baz"
    COPY --from=0  # would copy from stage 0 no matter what because stages take priority
    COPY --from=foo:latest # replaced by istag foo
    ...
    FROM random:v2 AS foo:latest # redefines foo:latest for subsequent images?
    ...
    FROM random:v3 # always set to build spec from (should any other random:v3 be set as well?)

Thoughts?

No matter what we'd rewrite the dockerfile prior to sending onwards

@smarterclayton smarterclayton force-pushed the multistage branch 3 times, most recently from afdd1b9 to 95db0fd Compare February 25, 2018 18:00
@bparees
Copy link
Contributor

bparees commented Feb 25, 2018

if i'm understanding your sample, the "replacement precedence" would be:

  1. if the image value is a stage ref, use the image from the referenced stage
  2. if the image value in the dockerfile matches an imagesource.as, replace with the imagesource
  3. replace the image value w/ the "build.from" value (effectively acts as a default)
  4. if there's no from, leave the value alone

i don't love (2). The rest makes sense and i'm not sure what i'd recommend instead of (2)... I suppose it just forces people to be explicit about their imagesource references which isn't a terrible thing, but might break some people unexpectedly when they create a docker buildconfig w/ an explicit buildconfig.from and all their image references get overwritten instead of just the first one.

Regarding the FROM x AS y behavior, i agree w/ the idea that future references to "y" should now get "x" instead of whatever "y" used to be defined as.

@smarterclayton
Copy link
Contributor Author

replace the image value w/ the "build.from" value (effectively acts as a default)

We could just say that the build "from" is an implicit image source for the value of the last stage. That's effectively what we do today.

@bparees
Copy link
Contributor

bparees commented Feb 26, 2018

We could just say that the build "from" is an implicit image source for the value of the last stage. That's effectively what we do today.

i like that better than treating it as a default. But would it also sub in for other stages that used the same value? And how does an "AS" in the dockerfile come into play?

var set sets.String
for i, image := range input.Images {
for j, name := range image.As {
if len(name) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth a comment explaining that you don't record an error here because validateImageSource already did.

// paths is a list of source and destination paths to copy from the image. This content will be copied
// into the build context prior to starting the build. If no paths are set, the build context will
// not be altered.
// +optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you validated that the rest of the build logic properly handles this being empty?

Copy link
Contributor Author

@smarterclayton smarterclayton Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went over all the uses of image paths - there weren't many. We didn't actually ever require them to be set that I could find outside of validation.

@smarterclayton
Copy link
Contributor Author

But would it also sub in for other stages that used the same value? And how does an "AS" in the dockerfile come into play?

I don't think it should sub in. However, we could say we do that replacement first, and if you did something like from: {kind: "DockerImage", name: "foo:bar"} and had an image source for foo:bar it would replace it.

AS I think always takes priority no matter what, since an AS always has an underlying name that can be replaced safely.

@bparees
Copy link
Contributor

bparees commented Feb 26, 2018

AS I think always takes priority no matter what, since an AS always has an underlying name that can be replaced safely.

i'm thinking about ordering. Do we do all the dockerfile AS replacements first, or all the imagesource.as+build.from replacements first?

I think the simplest thing to reason about would be do that do all the substitutions into the dockerfile, then use the dockerfile normally. But that potentially contradicts your statement of FROM random:v2 AS foo:latest # redefines foo:latest for subsequent images?

and i think that in "FROM x AS y" we should perhaps not substitute the value of y, ever. That value should be reserved for users trying to alias things, we shouldn't mess with it. ('x', of course, is fair game for substitution)

@smarterclayton
Copy link
Contributor Author

But would it also sub in for other stages that used the same value? And how does an "AS" in the dockerfile come into play?

I don't think it should sub in. However, we could say we do that replacement first, and if you did something like from: {kind: "DockerImage", name: "foo:bar"} and had an image source for foo:bar it would replace it.

@smarterclayton
Copy link
Contributor Author

So for FROM a AS b, no one may ever override b once it's been invoked (they can set it beforehand). Anyone should be able to replace a.

@smarterclayton
Copy link
Contributor Author

Take a look at the test cases I added and see if they make logical sense to you (in common_test.go for addBuildParameters).

@smarterclayton
Copy link
Contributor Author

Comments addressed

@smarterclayton smarterclayton changed the title WIP - Support multi-stage dockerbuilds via imagebuilder Support multi-stage dockerbuilds via imagebuilder Feb 27, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2018
@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2018
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 27, 2018
@smarterclayton
Copy link
Contributor Author

/test all

@smarterclayton
Copy link
Contributor Author

/retest

The previous code is too complex and tries to be clever with existing
values. test-cmd is local, just stomp the set of values.
Unless the user sets the imageOptimizationPolicy to None, use
imagebuilder to perform multi-stage builds.
@smarterclayton
Copy link
Contributor Author

/retest

5 similar comments
@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@smarterclayton
Copy link
Contributor Author

/retest

@bparees
Copy link
Contributor

bparees commented Mar 9, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 9, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [bparees,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 18887, 18741).

@openshift-merge-robot openshift-merge-robot merged commit 377dc4d into openshift:master Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. needs-api-review size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants