-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Enable remote layer federation #10120
[WIP] Enable remote layer federation #10120
Conversation
…top-level package Drop this during next distribution rebase. Revert also reverts "UPSTREAM: docker/distribution: <carry>: export storage.CreateOptions" (commit 8c8bc43) Signed-off-by: Michal Minář <[email protected]>
…te method during cross-repo mount
Signed-off-by: Michal Minář <[email protected]>
Signed-off-by: Michal Minář <[email protected]>
Modified blobDescriptorService.Stat, which used to bypass layer link check, to verify that requested blob exists either as a layer link locally in repository or is referenced in corresponding image stream. The latter is obtained from etcd by fetching image stream and all its images until the blob is found. Internal digest cache is used to store cache blob <-> repository pairs to reduce number of etcd queries. Signed-off-by: Michal Minář <[email protected]>
For security reasons, evict stale pairs of (blob, repository) from cache. So when an image is untagged from image stream, registry will deny access to its blobs. Also fixed a bug in digestcache where the first association for particular digest was lost and once.
Turned blob repository ttl into a config option. Also allowed for overrides using env var REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_BLOBREPOSITORYCACHETTL. Signed-off-by: Michal Minar <[email protected]>
Use much smaller image for pulls. Also Deal with multiple image candidates for deletion. Signed-off-by: Michal Minář <[email protected]>
Signed-off-by: Michal Minář <[email protected]>
Signed-off-by: Michal Minář <[email protected]>
Signed-off-by: Michal Minář <[email protected]>
During a cross-repo mount push where the mounted blob belongs to remote image, tag the image into target image stream. This will utilize pullthrough to federate image's blobs without storing them in the registry. Image is tagged under `_pullthrough_dep_${blobdgst}` tag. Signed-off-by: Michal Minář <[email protected]>
/cc @smarterclayton, @liggitt I'm not sure about the tagging part. Since blobs are pushed in parallel, there will be conflicts in updating target image stream. If there is 10 blobs mounted from another source repository during a single image push, the code could generate - in the worst case - 10 different tags in target image stream referring to the same image. The tags have a common prefix |
[test] |
I don't like tagging these into the user space (the user tags). It doesn't protect against the source image being untagged in the origin repo. I do like there being some reference. Why not add this as a field on image status (the status tag pointing to image ID and some other data about the pull)? Before we complete an implementation, let's answer the following questions:
The answers to those questions might steer us in different directions:
|
If a simple manual pull is tested, it succeeds if the docker secret is in the same namespace the user tries to pull from. The same applies to the builds. The secret must exist in the same namespace as the build config. There is probably some bug though. When I tested just with default namespace, I only had to add the secret to
It's a copy of a reference 😄. In this PR IS status is extended for a By a reference, do you mean a ref to the repo in the internal registry (e.g.
In case of remote images, the reference might be valid for a very short period of time. With the copy approach, the access would be forever. In the latter case, until the owner of source repository changes his mind. I'm not sure what's the better.
If the cross mount from the remote repo fails, the layers will be pushed as usual.
Shall we then pursue a different path to implement remote layer federation?
Accepting is already the case if the pullthrough fails.
Wouldn't it be easier to store the information just in etcd? Have a one special image stream with base images tagged whose layers shouldn't be stored? |
To be clear, I'm not asking about what this PR does - I'm asking what |
Follow up questions:
I'm not in favor of this for a lot of reasons - mostly because it
If that is the case the act of tagging has to result in copies of
I don't like having special images for lots of reasons (creating them If all those things are true (no push block, copy binaries locally on I'd like to revisit the image stream -> layer digest denormalization |
I agree that content management gets more difficult with different layers stored on different locations based on some imagestream of global configuration. It makes export / import or migration of the image very painful. It would be manageable if the locations are stored in the image manifest itself as already done with MS images (see below). If I understand the concept of remote layer federation correctly, it should prevent ISVs from redistributing 3rd party content. If any configuration like modifying image streams or listing special blobs somewhere is needed for this purpose, it's very error prone. And ISV may easily end up serving the content by a misconfiguration. Image itself should carry the locations where its blobs reside. Unfortunately, there's not much we can do about manifest v2 schema 1. If we want to federate it, there has to be some configuration involved. We don't need to prevent blob pushes, we need to prevent serving them - ideally by redirecting to upstream's CDN. But if we don't want serve them, there's no point in storing them. If the Docker client doesn't even attempt to upload them, we get faster push times and more free space on registry's storage. If we don't want to patch Docker client, cross-repo mount feature is our only option. Unfortunately it's not very reliable.
That's what is already true for Microsoft images. See https://github.com/docker/distribution/pull/1725/files#diff-cf41dce100228ea2e316a7f821bebaf6R73 and corresponding Docker client implementation moby/moby#22866. Unfortunately, support for other operating systems seems to be sursued with a lesser effort: distribution/distribution#1825. Also only the manifest v2 schema 2 is supported.
I'm in favor of copy on tag. The copy really means, create all the layer links that already exist in the source repository in the destination one for all the local blobs. In the case of remote blobs, IMHO they should stay remote. As I said earlier, I'd rather store the pull location on the image object. Based on some global or per-imagestream configuration, the registry would know which blobs of uploaded image are federated and would appropriately set the information on the image before creating imagestreammapping.
If we're going to copy all the blobs during push or tag, does the actor have to be the cluster? Can it be the registry? Cluster doesn't really have a clue whether the blob exists locally or not. The copy of local blob is very cheap, it's not the case for remotes though.
I do like an idea of having a special image stream For import/export/migration of images, it would be desirable to keep the pull locations in the image object at least as an annotation. The locations would be set by registry during a push.
Hopefully, the manifest v2 schema 2 federation will be extended to other OSs soon. All in all it will be much cleaner approach. Nevertheless, our CDN isn't ready for that since it just recently switched to schema 1. To satisfy our PMs, we'd like to have something in for 3.3 already.
We need to store at least: type ImageStream struct {
...
Status struct {
...
// keys are blob digests
Blobs map[string]struct{
Size int64
// empty if the blob is not federated
OriginRepository string
}
}
} What worries me is that many blobs will be duplicated across many image streams. Image streams are already huge. Moreover there are cases where we need to lookup blob on a global level without a knowledge of image stream (such as obtaining layer size for manifests that don't store them). I'd be in favor to have this index as a top-level resource. Where the blob details structure would contain also references to all the containing image streams. The index would have to be kept up to date by all image(stream) related RESTs. All in all IMHO this is inevitable step forward. Relying partially on etcd and layer links in the registry to determine access rights and keeping it synchronized is a real mess. Having the information just in etcd will make things much simpler. |
So functionally, this is working: $ oc describe istag is:latest | grep 'Docker Image'
Docker Image: miminarnb.vm:5003/miminar/hello-world@sha256:e1b37e6d7eeaadf06a68fb11e66c80c9613e8d743ef5881036ebf8b94ac25a2e
$ oc describe secret/miminarnbregistry
Name: miminarnbregistry
Namespace: src
Labels: <none>
Annotations: <none>
Type: kubernetes.io/dockercfg
Data
====
.dockercfg: {"http://miminarnb.vm:5003/v2/":{"username":"minami","password":"12345","email":"[email protected]","auth":"bWluYW1pOjEyMzQ1"}}
$ # give pull access to builder in alice's *target* namespace
$ oc policy add-role-to-user registry-viewer system:serviceaccount:target:builder
$ # or give pull access to all
$ oc policy add-role-to-group registry-viewer system:authenticated Alice then can import the image like this: oc import-image --from=172.30.241.183:5000/src/is:latest --confirm target There's a little glitch though. The importer tries to stat all the blobs to get layer sizes. So the registry goes straight to remote repository. And thanks to #8613, all the layers are downloaded and it slows down the import quite a bit. I'd like to finish the tests for a fix in #9796 today so it doesn't bother us anymore. Anyway, this is another place, where a global index of blobs would be really handy. |
@miminar @smarterclayton is this requirement for 3.3? We are getting short on time here. |
To clarify - the image is a good mechanism for associating blobs with alternate locations, but the actual serving is based on the blob. The CDN and federation case (typically public) is that you should retrieve specific blobs from certain locations because of policy or performance. In those cases images might be the way to identify the layers (all the layers of these images in this image stream) but the outcome is not image based (it's layer based). So to summarize:
The structure for materializing blob layers into an image stream has to handle referential counting, so it probably can't collapse common layers from images. I.e. it has to be:
The good news is that structure could be populated via a controller or migration if necessary, so if we hit transient failures during tagging (can't access the image metadata) we could backfill the data. |
Evaluated for origin test up to 1670b8c |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/7346/) |
Origin Action Required: Pull request cannot be automatically merged, please rebase your branch from latest HEAD and push again |
Closing this. The implemented solution is not thorough. The pullthrough middleware with cross-repo mount cannot be used alone to address the problem. Moreover this is of a low priority now. |
During a cross-repo mount push where the mounted blob belongs to remote
image, tag the image into target image stream. This will utilize
pullthrough to federate image's blobs without storing them in the
registry.
Image is tagged under
_pullthrough_dep_${blobdgst}
tag.A follow-up for #9819 addressing the remote layer fedration part.
Please review just the last commit. All the preceding belong to #9819.