Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #15213 #16219 #16340 #17080 #18221 on release-3.6 #18223

Conversation

enj
Copy link
Contributor

@enj enj commented Jan 23, 2018

Cherry pick of #15213 #16219 #16340 #17080 #18221 on release-3.6.

#15213: Not found errors must match object in migration
#16219: Use discovery based version gating for policy commands
#16340: Decrement retries count during migration
#17080: Correctly handle NotFound errors during migration
#18221: Fix issues with oc adm migrate authorization

simo5 and others added 2 commits January 22, 2018 22:33
The following commands were version gated:

- oc create policybinding
- oc adm overwrite-policy
- oc adm migrate authorization

This is because in 3.7 we will store only k8s.io RBAC objects and
not the Origin Policy Objects.  All of the gated commands assume the
presence of policy objects, and thus it does not make sense to let
users run these against newer clusters.

Signed-off-by: Simo Sorce <[email protected]>
Signed-off-by: Monis Khan <[email protected]>
This change tightens the code to not tolerate not found errors that
refer to a different object than what was being migrated.  For
example, the previous code would not error if the not found error
refereed to the namespace being missing instead of the actual
object.  Thus we can detect objects that have been orphaned inside
of a deleted namespace.

Signed-off-by: Monis Khan <[email protected]>
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 23, 2018
@enj
Copy link
Contributor Author

enj commented Jan 23, 2018

/assign @simo5 @mrogers950

Well, this was painful.

I know the shell based tests in d970765 need to be fixed since they should not fail with a 3.6 server. Otherwise this is ready for a review.

@simo5
Copy link
Contributor

simo5 commented Jan 23, 2018

Are there any major changes I should look into compared to the master version ?

@enj
Copy link
Contributor Author

enj commented Jan 23, 2018

@simo5 @mrogers950 the main difference is that these commands were not deprecated in 3.6, so they can still be run without issue. Just take a look at the diff as a whole and confirm that it does not violate expectations for 3.6. Overall the patches applied pretty cleanly once I figured out everything that was needed.

@simo5
Copy link
Contributor

simo5 commented Jan 23, 2018

I do not see any blatant violation, I had to think on the gating but it seem reasonable to backport.

Matt Rogers and others added 5 commits January 23, 2018 13:20
This change fixes a bug in the migration command that could lead to
it being permanently stuck in a retry loop.  This would only happen
if some other entity continuously changed a resource that was in the
process of being migrated.  The command would then attempt to retry
due to the conflict error.  Since the retry count was never
decremented, the migration would essentially be stuck in an infinite
loop.  To avoid having to mutate global state, retries was removed
as a field of migrateTracker.  It is instead directly passed to the
try function and thus is reset on every invocation of result.Visit.

Signed-off-by: Monis Khan <[email protected]>
This change makes it so that the migrate storage command will now
reprocess the body of a failed GET request to follow the standard
path of extracting a status error.  This normalizes the structure of
the errors the code must handle.

When comparing the resource.Info to an error, the info's REST
mapping is now used to extract group kind information since the
associated runtime.Object is not guaranteed to have valid type meta
data.  This is combined with relaxed case-insensitive matching to
make sure that we only fail the migrate storage command on NotFound
errors that we know do not match the info.

When a NotFound error occurs, the migrate command now correctly
reports that it did not change the resource.  Previously it reported
that it had successfully migrated the resource, which is not true
since it is impossible to migrate a resource that does not exist.

Bug 1506006

Signed-off-by: Monis Khan <[email protected]>
This change adds glog style timestamps to the reporting output of
the migration tracker.  This will aid in debugging migration errors
by making it easier to correlate client errors with master logs.

Signed-off-by: Monis Khan <[email protected]>
This change handles a number of bugs with the migrate authorization
command:

1. Rework MigrateAuthorizationOptions.checkParity to be a
MigrateActionFunc instead of a MigrateVisitFunc.  This allows us to
take advantage of migrate's default retry handling.
2. Add proper retry logic to checkParity, and have all check* funcs
return the appropriate TemporaryError based on the situation.  This
coupled with migrate's existing retry logic makes the command
resilient against common errors such as the deletion of resources.
3. Remove the binding of the standard migrate flags from migrate
authorization.  This command supports no parameters, and exposing
the standard migrate parameters allows the user to accidentally
break how the command runs.
4. Fix GroupVersion constants used for discovery based gating.  They
were incorrectly set to the internal version instead of v1.  This
would cause the policy based gating to always think that the server
did not support policy objects.
5. Force RBAC client to use v1beta1 since that is the only version
supported by a 3.6 server.  This allows you to use a 3.9 client
against a 3.6 server.
6. Remove rate limiting from the RBAC client to fix BZ 1513139.
Only a cluster admin can interact with RBAC resources on a 3.6
server, so this will quickly error out if run by a non-privileged
user.

Bug 1513139

Signed-off-by: Monis Khan <[email protected]>
@enj enj force-pushed the automated-cherry-pick-of-#15213-#16219-#16340-#17080-#18221-upstream-release-3.6 branch from 0931c72 to fbcc41e Compare January 23, 2018 18:23
@enj
Copy link
Contributor Author

enj commented Jan 23, 2018

/retest

@simo5
Copy link
Contributor

simo5 commented Jan 23, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj, simo5

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@enj
Copy link
Contributor Author

enj commented Jan 24, 2018

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link

@enj: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_install fbcc41e link /test extended_conformance_install
ci/openshift-jenkins/gcp fbcc41e link /test gcp

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@enj
Copy link
Contributor Author

enj commented Jan 25, 2018

Closing since 3.6 CI is broken.

@enj enj closed this Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants