[WIP] Enable remote layer federation #10120

miminar · 2016-07-29T20:59:35Z

During a cross-repo mount push where the mounted blob belongs to remote
image, tag the image into target image stream. This will utilize
pullthrough to federate image's blobs without storing them in the
registry.

Image is tagged under _pullthrough_dep_${blobdgst} tag.

A follow-up for #9819 addressing the remote layer fedration part.

Please review just the last commit. All the preceding belong to #9819.

…top-level package Drop this during next distribution rebase. Revert also reverts "UPSTREAM: docker/distribution: <carry>: export storage.CreateOptions" (commit 8c8bc43) Signed-off-by: Michal Minář <[email protected]>

…te method during cross-repo mount

Signed-off-by: Michal Minář <[email protected]>

Modified blobDescriptorService.Stat, which used to bypass layer link check, to verify that requested blob exists either as a layer link locally in repository or is referenced in corresponding image stream. The latter is obtained from etcd by fetching image stream and all its images until the blob is found. Internal digest cache is used to store cache blob <-> repository pairs to reduce number of etcd queries. Signed-off-by: Michal Minář <[email protected]>

For security reasons, evict stale pairs of (blob, repository) from cache. So when an image is untagged from image stream, registry will deny access to its blobs. Also fixed a bug in digestcache where the first association for particular digest was lost and once.

Turned blob repository ttl into a config option. Also allowed for overrides using env var REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_BLOBREPOSITORYCACHETTL. Signed-off-by: Michal Minar <[email protected]>

Use much smaller image for pulls. Also Deal with multiple image candidates for deletion. Signed-off-by: Michal Minář <[email protected]>

Signed-off-by: Michal Minář <[email protected]>

During a cross-repo mount push where the mounted blob belongs to remote image, tag the image into target image stream. This will utilize pullthrough to federate image's blobs without storing them in the registry. Image is tagged under `_pullthrough_dep_${blobdgst}` tag. Signed-off-by: Michal Minář <[email protected]>

miminar · 2016-07-29T21:11:57Z

/cc @smarterclayton, @liggitt

I'm not sure about the tagging part. Since blobs are pushed in parallel, there will be conflicts in updating target image stream. If there is 10 blobs mounted from another source repository during a single image push, the code could generate - in the worst case - 10 different tags in target image stream referring to the same image.

The tags have a common prefix _pullthrough_dep_ which can be easily garbage collected during a prune. It would be better to ensure the 1 tag is created at most for single remote image though. Unfortunately image stream strategy doesn't allow to modify just the status - to handle the tag addition and possible conflicts in the registry itself.

miminar · 2016-07-29T21:12:59Z

[test]

smarterclayton · 2016-07-30T03:14:54Z

I don't like tagging these into the user space (the user tags). It doesn't protect against the source image being untagged in the origin repo. I do like there being some reference. Why not add this as a field on image status (the status tag pointing to image ID and some other data about the pull)?

Before we complete an implementation, let's answer the following questions:

If a user copies a pullthrough tag to another repo, does that remove the original user's control over access? Ie if I revoke secrets, does everyone lose access?
Is tagging a "copy" or "reference"?
Is pullthrough a convenience, or a core feature? Ie once you import an image, does tagging implicitly grant the tagger the right to access the image forever?
When building a new image and pushing it back to the registry, does that mean that I own those parent layers? If so, we should probably be importing those layers from upstream, or just letting push succeed.

The answers to those questions might steer us in different directions:

Tagging a pullthrough image should "background import" the layers locally
Pushing a pullthrough image (as part of a parent layer) might just accept those layers (or background import them, or maybe only if it's in another namespace). This would slow down push, but preserves behavior
Keeping base image layers out of the registry might be a specific feature, where there is a local blob stored that points to some metadata that blocks upload.

miminar · 2016-07-31T16:33:33Z

If a user copies a pullthrough tag to another repo, does that remove the original user's control over access? Ie if I revoke secrets, does everyone lose access?

If a simple manual pull is tested, it succeeds if the docker secret is in the same namespace the user tries to pull from. The same applies to the builds. The secret must exist in the same namespace as the build config.

There is probably some bug though. When I tested just with default namespace, I only had to add the secret to sa/builder (for pull). Then I tested with a user in his private namespace. His build config had to contain the pull secret in order for build to succeed. Adding it to sa/builder (for pull) in the same namespace didn't help.

Is tagging a "copy" or "reference"?

It's a copy of a reference 😄. In this PR IS status is extended for a
copy of the source reference (e.g. docker.io/user/app@hash).

By a reference, do you mean a ref to the repo in the internal registry (e.g. 172.30.241.183:5000/user/app@hash)? This seems to be a better approach. The secret wouldn't have to be exposed everywhere - just in one namespace, where the reference points to. I need to try it out.

Is pullthrough a convenience, or a core feature? Ie once you import an image, does tagging implicitly grant the tagger the right to access the image forever?

In case of remote images, the reference might be valid for a very short period of time. With the copy approach, the access would be forever. In the latter case, until the owner of source repository changes his mind.

I'm not sure what's the better.

When building a new image and pushing it back to the registry, does that mean that I own those parent layers? If so, we should probably be importing those layers from upstream, or just letting push succeed.

If the cross mount from the remote repo fails, the layers will be pushed as usual.

Tagging a pullthrough image should "background import" the layers locally

Shall we then pursue a different path to implement remote layer federation?

Pushing a pullthrough image (as part of a parent layer) might just accept those layers (or background import them, or maybe only if it's in another namespace). This would slow down push, but preserves behavior

Accepting is already the case if the pullthrough fails.

Keeping base image layers out of the registry might be a specific feature, where there is a local blob stored that points to some metadata that blocks upload.

Wouldn't it be easier to store the information just in etcd? Have a one special image stream with base images tagged whose layers shouldn't be stored?

smarterclayton · 2016-07-31T17:09:30Z

To be clear, I'm not asking about what this PR does - I'm asking what
we should do, i.e. What the user expects, what the admin desires,
and what the best combination of those are.

smarterclayton · 2016-07-31T17:37:46Z

Follow up questions:

Should we allow an admin to prevent layer content from being pushed
to the registry?

I'm not in favor of this for a lot of reasons - mostly because it
complicates long term management of content and forces every push to
enforce that invariant. I probably am in favor of indicating
alternate pull locations for layers, which fits the red hat registry
pattern as well as CDN use cases. And the general use case for this
is as a side effect of ensuring federation is used to serve content,
so we may want to instead solve "can certain layers get flagged as
coming from alternate registries"

If we aren't preventing layer content from being pushed to a
registry, then the way that someone has to do a copy today is to push
the image, but that's not very user friendly. So would users prefer
to just have tag be "copy", since that's all push/pull is?

If that is the case the act of tagging has to result in copies of
layers made into the system, regardless of the origin. Does that have
to be background copy, or can it be lazy (like the existing docker
registry proxy mechanism)? At that point the cluster becomes the
actor, so it's legal for the cluster to find any image and credentials
that allow download in order to materialize the layers in the local
registry.

If we are truly trying to implement layer federation, as alluded
above, then we need a way to indicate the layer belongs somewhere else
(as you mentioned). Where does that information get stored?

I don't like having special images for lots of reasons (creating them
ourselves). We have to have some way to store the location of the
image (federation without the pointer to where federation should be
isn't great). We have to have a way that metadata is managed.
Federation is great for CDN, and generally CDN is public.

If all those things are true (no push block, copy binaries locally on
cross tag, and CDN being the more general use case) then maybe the
federation case can be reduced to solving "given a layer digest, are
there any registries that prefer to serve this content". That could
be an index maintained by the registry, in a pattern similar to what
you have here, but slightly more flexible.

I'd like to revisit the image stream -> layer digest denormalization
discussion this week and see what we could do to make that workable.
That would make it easier to look up "federation" image streams (a
stream that contains tags to images that should be federated),
assuming that we can figure out a way to represent that for end users
easily.

miminar · 2016-08-01T09:14:35Z

Should we allow an admin to prevent layer content from being pushed to the registry?

I agree that content management gets more difficult with different layers stored on different locations based on some imagestream of global configuration. It makes export / import or migration of the image very painful. It would be manageable if the locations are stored in the image manifest itself as already done with MS images (see below).

If I understand the concept of remote layer federation correctly, it should prevent ISVs from redistributing 3rd party content. If any configuration like modifying image streams or listing special blobs somewhere is needed for this purpose, it's very error prone. And ISV may easily end up serving the content by a misconfiguration. Image itself should carry the locations where its blobs reside.

Unfortunately, there's not much we can do about manifest v2 schema 1. If we want to federate it, there has to be some configuration involved.

We don't need to prevent blob pushes, we need to prevent serving them - ideally by redirecting to upstream's CDN. But if we don't want serve them, there's no point in storing them. If the Docker client doesn't even attempt to upload them, we get faster push times and more free space on registry's storage. If we don't want to patch Docker client, cross-repo mount feature is our only option. Unfortunately it's not very reliable.

I probably am in favor of indicating alternate pull locations for layers, which fits the red hat registry pattern as well as CDN use cases. And the general use case for this is as a side effect of ensuring federation is used to serve content, so we may want to instead solve "can certain layers get flagged as coming from alternate registries"

That's what is already true for Microsoft images. See https://github.com/docker/distribution/pull/1725/files#diff-cf41dce100228ea2e316a7f821bebaf6R73 and corresponding Docker client implementation moby/moby#22866. Unfortunately, support for other operating systems seems to be sursued with a lesser effort: distribution/distribution#1825.

Also only the manifest v2 schema 2 is supported.

If we aren't preventing layer content from being pushed to a registry, then the way that someone has to do a copy today is to push the image, but that's not very user friendly. So would users prefer to just have tag be "copy", since that's all push/pull is?

I'm in favor of copy on tag. The copy really means, create all the layer links that already exist in the source repository in the destination one for all the local blobs. In the case of remote blobs, IMHO they should stay remote. As I said earlier, I'd rather store the pull location on the image object. Based on some global or per-imagestream configuration, the registry would know which blobs of uploaded image are federated and would appropriately set the information on the image before creating imagestreammapping.

If that is the case the act of tagging has to result in copies of layers made into the system, regardless of the origin. Does that have to be background copy, or can it be lazy (like the existing docker registry proxy mechanism)? At that point the cluster becomes the actor, so it's legal for the cluster to find any image and credentials that allow download in order to materialize the layers in the local registry.

If we're going to copy all the blobs during push or tag, does the actor have to be the cluster? Can it be the registry? Cluster doesn't really have a clue whether the blob exists locally or not. The copy of local blob is very cheap, it's not the case for remotes though.

If we are truly trying to implement layer federation, as alluded above, then we need a way to indicate the layer belongs somewhere else (as you mentioned). Where does that information get stored?

I do like an idea of having a special image stream default/federatedimages (or perhaps openshift/federatedimages where the admin would tag all the remote images. Its tags would be periodically refreshed. The IS would be excluded from pruning.

For import/export/migration of images, it would be desirable to keep the pull locations in the image object at least as an annotation. The locations would be set by registry during a push.

I don't like having special images for lots of reasons (creating them ourselves). We have to have some way to store the location of the image (federation without the pointer to where federation should be isn't great). We have to have a way that metadata is managed. Federation is great for CDN, and generally CDN is public.

Hopefully, the manifest v2 schema 2 federation will be extended to other OSs soon. All in all it will be much cleaner approach. Nevertheless, our CDN isn't ready for that since it just recently switched to schema 1. To satisfy our PMs, we'd like to have something in for 3.3 already.

If all those things are true (no push block, copy binaries locally on cross tag, and CDN being the more general use case) then maybe the federation case can be reduced to solving "given a layer digest, are there any registries that prefer to serve this content". That could be an index maintained by the registry, in a pattern similar to what you have here, but slightly more flexible.

I'd like to revisit the image stream -> layer digest denormalization discussion this week and see what we could do to make that workable. That would make it easier to look up "federation" image streams (a stream that contains tags to images that should be federated), assuming that we can figure out a way to represent that for end users easily.

We need to store at least:

type ImageStream struct {
    ...
    Status struct {
        ...
        // keys are blob digests
        Blobs map[string]struct{
            Size int64
            // empty if the blob is not federated
            OriginRepository string
        }
    }
}

What worries me is that many blobs will be duplicated across many image streams. Image streams are already huge. Moreover there are cases where we need to lookup blob on a global level without a knowledge of image stream (such as obtaining layer size for manifests that don't store them). I'd be in favor to have this index as a top-level resource. Where the blob details structure would contain also references to all the containing image streams.

The index would have to be kept up to date by all image(stream) related RESTs.

All in all IMHO this is inevitable step forward. Relying partially on etcd and layer links in the registry to determine access rights and keeping it synchronized is a real mess. Having the information just in etcd will make things much simpler.

miminar · 2016-08-01T12:48:50Z

By a reference, do you mean a ref to the repo in the internal registry (e.g. 172.30.241.183:5000/user/app@hash)? This seems to be a better approach. The secret wouldn't have to be exposed everywhere - just in one namespace, where the reference points to. I need to try it out.

So functionally, this is working:

$ oc describe istag is:latest | grep 'Docker Image'
Docker Image:   miminarnb.vm:5003/miminar/hello-world@sha256:e1b37e6d7eeaadf06a68fb11e66c80c9613e8d743ef5881036ebf8b94ac25a2e

$ oc describe secret/miminarnbregistry
Name:           miminarnbregistry
Namespace:      src
Labels:         <none>
Annotations:    <none>

Type:   kubernetes.io/dockercfg

Data
====
.dockercfg:     {"http://miminarnb.vm:5003/v2/":{"username":"minami","password":"12345","email":"[email protected]","auth":"bWluYW1pOjEyMzQ1"}}

$ # give pull access to builder in alice's *target* namespace
$ oc policy add-role-to-user registry-viewer system:serviceaccount:target:builder
$ # or give pull access to all
$ oc policy add-role-to-group registry-viewer system:authenticated

Alice then can import the image like this:

oc import-image --from=172.30.241.183:5000/src/is:latest --confirm target

There's a little glitch though. The importer tries to stat all the blobs to get layer sizes. So the registry goes straight to remote repository. And thanks to #8613, all the layers are downloaded and it slows down the import quite a bit. I'd like to finish the tests for a fix in #9796 today so it doesn't bother us anymore.

Anyway, this is another place, where a global index of blobs would be really handy.

mfojtik · 2016-08-01T17:14:02Z

@miminar @smarterclayton is this requirement for 3.3? We are getting short on time here.

smarterclayton · 2016-08-01T17:42:12Z

Image itself should carry the locations where its blobs reside.

To clarify - the image is a good mechanism for associating blobs with alternate locations, but the actual serving is based on the blob. The CDN and federation case (typically public) is that you should retrieve specific blobs from certain locations because of policy or performance. In those cases images might be the way to identify the layers (all the layers of these images in this image stream) but the outcome is not image based (it's layer based).

So to summarize:

Images are a good mechanism to track where source blobs are located
An image stream is a good way to identify a set of images that should be considered candidates for federation. However, pruning needs to take that into account
We need to consider whether we need local federation, or just global federation. Local federation has security issues unless it's also local because you can force someone to go to a different server. So for now, global federation seems more common than local federation.
We should be using "copy on tag" rather than "referential tag via image stream" for materializing a copy. Not all pull through is federation.
CDN use cases may want a local copy of a federated layer for safety. Layer federation for legal policy does not want a local copy to be served. So copy on tag is sufficient for both, and only layer federation for legal policy needs the bit "don't serve this under any condition".

The structure for materializing blob layers into an image stream has to handle referential counting, so it probably can't collapse common layers from images. I.e. it has to be:

status:
  tags: 
  - name: foo
    events:
    - image: sha256:abc1
  layers:
    "sha256:abc1":
    - digest: ...
       size: ...

The good news is that structure could be populated via a controller or migration if necessary, so if we hit transient failures during tagging (can't access the image metadata) we could backfill the data.

openshift-bot · 2016-08-01T22:34:34Z

Evaluated for origin test up to 1670b8c

openshift-bot · 2016-08-01T23:48:16Z

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/7346/)

openshift-bot · 2016-08-08T05:42:09Z

Origin Action Required: Pull request cannot be automatically merged, please rebase your branch from latest HEAD and push again

miminar · 2016-10-06T15:40:47Z

Closing this. The implemented solution is not thorough. The pullthrough middleware with cross-repo mount cannot be used alone to address the problem.

Moreover this is of a low priority now.

Michal Minář and others added 12 commits July 21, 2016 16:45

UPSTREAM: docker/distribution: 1857: Provide stat descriptor for Crea…

014a0b1

…te method during cross-repo mount

UPSTREAM: docker/distribution: <carry>: added missing testutil package

101f5e4

Signed-off-by: Michal Minář <[email protected]>

Store media type in image

f1ecb1d

Signed-off-by: Michal Minář <[email protected]>

Configurable blobrepositorycachettl value

19f594b

Turned blob repository ttl into a config option. Also allowed for overrides using env var REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_BLOBREPOSITORYCACHETTL. Signed-off-by: Michal Minar <[email protected]>

e2e: speed-up docker repository pull tests

fb1b536

Use much smaller image for pulls. Also Deal with multiple image candidates for deletion. Signed-off-by: Michal Minář <[email protected]>

e2e: added tests for cross-repo mounting

ca97574

Signed-off-by: Michal Minář <[email protected]>

Allow to mock default registry client

ba11b23

Signed-off-by: Michal Minář <[email protected]>

Added unit tests for repository and blobdescriptorservice

2a874eb

Signed-off-by: Michal Minář <[email protected]>

miminar mentioned this pull request Jul 29, 2016

User can get only blobs he's able to see #9819

Merged

miminar changed the title ~~Enable remote layer federation~~ [WIP] Enable remote layer federation Aug 2, 2016

openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2016

miminar closed this Oct 6, 2016

miminar deleted the remote-layer-federation-complete branch November 10, 2016 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Enable remote layer federation #10120

[WIP] Enable remote layer federation #10120

miminar commented Jul 29, 2016

miminar commented Jul 29, 2016

miminar commented Jul 29, 2016

smarterclayton commented Jul 30, 2016

miminar commented Jul 31, 2016

smarterclayton commented Jul 31, 2016

smarterclayton commented Jul 31, 2016

miminar commented Aug 1, 2016

miminar commented Aug 1, 2016

mfojtik commented Aug 1, 2016

smarterclayton commented Aug 1, 2016

openshift-bot commented Aug 1, 2016

openshift-bot commented Aug 1, 2016

openshift-bot commented Aug 8, 2016

miminar commented Oct 6, 2016

[WIP] Enable remote layer federation #10120

[WIP] Enable remote layer federation #10120

Conversation

miminar commented Jul 29, 2016

miminar commented Jul 29, 2016

miminar commented Jul 29, 2016

smarterclayton commented Jul 30, 2016

miminar commented Jul 31, 2016

smarterclayton commented Jul 31, 2016

smarterclayton commented Jul 31, 2016

miminar commented Aug 1, 2016

miminar commented Aug 1, 2016

mfojtik commented Aug 1, 2016

smarterclayton commented Aug 1, 2016

openshift-bot commented Aug 1, 2016

openshift-bot commented Aug 1, 2016

openshift-bot commented Aug 8, 2016

miminar commented Oct 6, 2016