Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image policy is resolving images on replica sets by default #15868

Merged

Conversation

smarterclayton
Copy link
Contributor

Even when local names are not requested, the image policy plugin is
deciding to rewrite image references in replica sets that point to the
integrated registry (with tags) to use digests. This causes the
deployment controller that created them to get wedged (because it
detects a change to the template) and become unable to update status on
that replica set.

https://bugzilla.redhat.com/show_bug.cgi?id=1481801

@openshift-merge-robot openshift-merge-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 18, 2017
@openshift-merge-robot openshift-merge-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-api-review labels Aug 18, 2017
@smarterclayton smarterclayton force-pushed the fix_imagepolicy_master branch 2 times, most recently from db9e95e to 33aef8c Compare August 19, 2017 00:02
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Aug 19, 2017 via email

Copy link
Contributor

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other option would be to check if such object has a controllerRef set - https://github.com/kubernetes/community/blob/master/contributors/design-proposals/controller-ref.md

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Aug 19, 2017 via email

@tnozicka
Copy link
Contributor

/test extended_conformance_install_update

@tnozicka
Copy link
Contributor

extended_conformance_install_update is blocked on #15874

Copy link
Contributor

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments; lgtm otherwise for short term

{TargetResource: GroupResource{Resource: "replicationcontrollers"}, LocalNames: true},
{TargetResource: GroupResource{Group: "extensions", Resource: "replicasets"}, LocalNames: true},
// pods are attempted for backwards compatibility.
{TargetResource: GroupResource{Resource: "pods"}, LocalNames: true, Type: Attempt},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton I have tried it with master and it behaved like AttemptRewrite notAttempt there. But maybe the backwards compatibility is w.r.t. something else?

With that said Attempt seems like more reasonable default option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master rewriting is a bug, and that's caused by local namespace resolution. If you look a few lines up the ResolveImage global (on the config) is defaulting to Attempt, and that's what we're controlling for here. What went into 3.6 was too aggressive.

},
},
},
// does not rewrite the config because build has Attempt by default, which overrides global policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not by default, but explicitly set Attempt. (Default is AttemptRewrite.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change the build, that was wrong.

@tnozicka
Copy link
Contributor

tnozicka commented Aug 21, 2017

In future if this plugin would mutate only top level objects which can be found by not having ownerRef (which all controllers set) this kind of issues would go away I think.

Like it would resolve the image only in Deployment itself and left RS and Pod with controllerRef set alone.

Unless there is a use case for mutating controlled objects.

@smarterclayton
Copy link
Contributor Author

I think there is a use case for mutating controlled objects - I explicitly want to be able to default replication controllers at time of snapshot

@smarterclayton
Copy link
Contributor Author

/retest

@tnozicka
Copy link
Contributor

In that case we need to work on supporting that upstream as you said and back it up with an use case / example.

@tnozicka
Copy link
Contributor

tnozicka commented Aug 21, 2017

networking/GitHub failed
/test extended_conformance_install_update

@tnozicka
Copy link
Contributor

new day; let's give it another shot
/test extended_conformance_install_update

@tnozicka
Copy link
Contributor

@openshift/api-review needs API review :)

@smarterclayton
Copy link
Contributor Author

Updated to make builds Attempt instead of AttemptRewrite

@smarterclayton
Copy link
Contributor Author

@liggitt need you to do quick API review. 3.6 is broken, in that rewrite should not happen unless requested. Now, resolution rules override / more specific than the default resolution.

Copy link
Contributor

@tnozicka tnozicka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits in comments; lgtm otherwise

{TargetResource: GroupResource{Group: "build.openshift.io", Resource: "builds"}, LocalNames: true},
{TargetResource: GroupResource{Resource: "replicationcontrollers"}, LocalNames: true},
{TargetResource: GroupResource{Group: "extensions", Resource: "replicasets"}, LocalNames: true},
// pods are rewrite attempted for backwards compatibility.
Copy link
Contributor

@tnozicka tnozicka Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to bikeshed on this but I don't get the meaning of rewrite attempted when the line bellow says Attempt, not AttemptRewrite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for resolution (do we look up this image value for possible application of rules).

},
},
},
// does not rewrite the config because build has DoNotAttempt by default, which overrides global policy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually build has Attempt by default; the rest holds true

@smarterclayton smarterclayton force-pushed the fix_imagepolicy_master branch from 209468e to 962153e Compare August 22, 2017 21:57
@smarterclayton
Copy link
Contributor Author

Updated with feedback from @liggitt

Rename Type to Policy. When defaulting ResolutionRules, default Policy to whatever the value of the top level ResolveImages value is, IF an execution rule is defined that references that resource (backwards compatible).

Makes Policy required so that users can't accidentally footgun themselves.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Aug 22, 2017 via email

// executionRuleCoversResource returns true if gr is covered by rule.
func executionRuleCoversResource(rule ImageExecutionPolicyRule, gr GroupResource) bool {
for _, target := range rule.OnResources {
if target.Group == gr.Group && (target.Resource == gr.Resource || target.Resource == "*") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can group be *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no compelling reason. apps could legitimately have a resource wildcard, but everything on the server seemed a stretch.

obj.ResolutionRules[i].Policy = DoNotAttempt
for _, rule := range obj.ExecutionRules {
if executionRuleCoversResource(rule, obj.ResolutionRules[i].TargetResource) {
obj.ResolutionRules[i].Policy = obj.ResolveImages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: break

@mfojtik
Copy link
Contributor

mfojtik commented Aug 24, 2017

@smarterclayton this will require ansible update to set the resolutionRules ?

@liggitt
Copy link
Contributor

liggitt commented Aug 24, 2017

API looks fine, just nits. make sure there's a release note about the new required field

@smarterclayton
Copy link
Contributor Author

@mfojtik no we rely on defaulting today

Will fix the nits and repost in a short time.

Even when local names are not requested, the image policy plugin is
deciding to rewrite image references in replica sets that point to the
integrated registry (with tags) to use digests. This causes the
deployment controller that created them to get wedged (because it
detects a change to the template) and become unable to update status on
that replica set.

https://bugzilla.redhat.com/show_bug.cgi?id=1481801
@smarterclayton smarterclayton force-pushed the fix_imagepolicy_master branch from 962153e to 32f53e3 Compare August 24, 2017 17:55
@smarterclayton
Copy link
Contributor Author

denitted

@mfojtik
Copy link
Contributor

mfojtik commented Aug 24, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, smarterclayton

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@tnozicka
Copy link
Contributor

/retest

@tnozicka
Copy link
Contributor

/test extended_conformance_install_update

@tnozicka
Copy link
Contributor

flake #15977
/test extended_conformance_install_update

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. needs-api-review priority/P0 queue/critical-fix size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants