Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow template instance controller to create CRDs #19396

Merged
merged 1 commit into from
May 4, 2018

Conversation

bparees
Copy link
Contributor

@bparees bparees commented Apr 17, 2018

Still need to resolve https://bugzilla.redhat.com/show_bug.cgi?id=1568607

merging this allows the creation of CRDs, but GC is still going to do weird things.

fixes #19368

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2018
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 17, 2018
@@ -47,7 +49,7 @@ objects:
ports:
- name: port
port: 1234
- apiVersion: v1
- apiVersion: route.openshift.io/v1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt I think this template covers all the variants you were concerned w/ now:

  1. openshift api object by group (route)
  2. openshift api object by legacy (service)
  3. k8s api object by standard api (configmap)
  4. k8s api object by group (CRD)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift api object by legacy (service)

service is a kube object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah dunno what i was thinking. updated w/ a second route object that uses the legacy api.

rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
rbac.NewRule("get").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
// delete needed to be able to set ownerrefs on objects
rbac.NewRule("delete").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton @liggitt @deads2k let the debate begin...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For blockOwnerDeletion? I thought it was update of */finalizers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was something like you can't set an owner ref on a resource type you don't have the power to delete. Made sense to me since setting an owner ref indirectly allows you to delete the object later (assuming you can delete the thing the owner ref points to)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense on an update, I didn't realize that was required to set an ownerRef on create. Is that what you're seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so but I didn't look at it super closely. (given that I didn't grant the controller update permissions, though, it seems like it would have to have been on create)

rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
rbac.NewRule("get").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
// delete needed to be able to set ownerrefs on objects
rbac.NewRule("delete").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about #19404 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that seems like it would work for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for delete specifically. still need create+get though)

@@ -163,6 +163,11 @@ func init() {
rbac.NewRule("create").Groups(kAuthzGroup).Resources("subjectaccessreviews").RuleOrDie(),
rbac.NewRule("update").Groups(templateGroup).Resources("templateinstances/status").RuleOrDie(),
rbac.NewRule("update").Groups(templateGroup).Resources("templateinstances/finalizers").RuleOrDie(),
// template instance controller needs to be able to create and get any resource the user is allowed to create.
rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's broad.

Copy link
Contributor

@liggitt liggitt Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this allows escalation flows like:

  • user authorized to create builds can now create custom builds (user passes create builds SAR check, template service broker passes create builds/custom check during actual create)
  • user authorized to create roles or create rolebindings can now create or bind a role containing any creation permission (user passes create roles SAR check, template service broker passes RBAC escalation check)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like i said, let the debate begin :)

the alternative I can see is to switch to using impersonation instead of SAR checks, which also has the advantage of fixing some other potential issues @liggitt noted, where the user might have permission to create X, but isn't allowed to create X when field Y is Z (e.g. privileged pods). Today that's not an issue since the controller SA can't create that either, but if we ever have a case where the controller SA can pass a particular admission control that the user can't pass, the SAR check isn't sufficient anymore and impersonation would be a better solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this is a good target for cluster role aggregation. It wouldn't eliminate cases like jordan indicated, but at least you wouldn't accidentally have the power to create a node or an oauthtoken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not familiar w/ cluster role aggregation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean granting those new permissions to everyone with "admin". That doesn't seem like a desirable model. (Nor, in general, does requiring the cluster-admin to go modify permissions to make Templates do what they are supposed to be able to do, strike me as very user friendly).

Also what is behind the dislike of impersonation? It seems like a better solution than what we've done already, nevermind the growing complexity/limitations as we are now seeing as we try to extend it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean granting those new permissions to everyone with "admin". That doesn't seem like a desirable model.

That is the only model that makes any sense from a normal user perspective. Consider if a broker creates a CRD and no user has access to it. The CRD is basically useless at that point. Without aggregation, there is also no way for the user to create a role / binding to the CRD. Thus they basically have to use the TSB as an escalation enabler.

(Nor, in general, does requiring the cluster-admin to go modify permissions to make Templates do what they are supposed to be able to do, strike me as very user friendly).

Seems like the exact UX I would want as a cluster admin. Normal users do not get to "install" new APIs into the cluster and say they "own" those new APIs. The cluster admin gets to opt-in to that. Afterwards, users are free to create instances of those CRDs just like an other resource they have access to.

Also what is behind the dislike of impersonation? It seems like a better solution than what we've done already, nevermind the growing complexity/limitations as we are now seeing as we try to extend it.

Because it turns the TSB into a component that is an indirect cluster admin (forever). In general we do not have a good way to state "do X as user Y but do not let me escalate into permissions I do not already have - i.e. make sure RBAC / admission / etc would let both of us do this thing".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the only model that makes any sense from a normal user perspective. Consider if a broker creates a CRD and no user has access to it. The CRD is basically useless at that point.

Not if i only want some subset of my users to have access to it.

Seems like the exact UX I would want as a cluster admin. Normal users do not get to "install" new APIs into the cluster and say they "own" those new APIs. The cluster admin gets to opt-in to that. Afterwards, users are free to create instances of those CRDs just like an other resource they have access to.

Again you're assuming I want all users in my cluster to be able to do this thing. If I don't want that, then I need to grant the TSB a different role (not admin) so i can grant access on a more fine grained level. This is a mess. The TSB should not need special permissions to create things, it should be able to create anything the user is allowed to create. Hence why impersonation is the right solution.

Because it turns the TSB into a component that is an indirect cluster admin (forever). In

So? As long as the TSB doesn't take any action that isn't impersonating the user requesting the action, who cares?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if i only want some subset of my users to have access to it.

If you want a more nuanced flow, then I am fine with you having to do it manually. I agree with @deads2k's assessment that the TSB is a a user focused tool and not a cluster-admin focused tool.

So? As long as the TSB doesn't take any action that isn't impersonating the user requesting the action, who cares?

Right, as long as the TSB has no bugs and there no unforeseen paths to cause it misbehave (or gain access to its token), it is perfectly fine. I want less components with root on the cluster, not more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a more nuanced flow, then I am fine with you having to do it manually.

I understand where you're coming from, but here's how that statement comes across to me: "i'm fine with @bparees spending his day explaining to customers how to make this work because we wouldn't let him/couldn't give him a way to do it that would have just worked out of the box."

I agree with @deads2k's assessment that the TSB is a a user focused tool and not a cluster-admin focused tool.

It's precisely because it's user focused that a cluster-admin should not have to go manage the TSB to enable a user to do things they can already do with their user account. The TSB is just another api for creating resources.

It's pretty easy for me to envision a cluster administrator creating a class of users who can create a particular CRD instance(e.g. people allowed to instantiate a particular operator), and those users wanting to be able to create those CRD instances via a template. With your proposal the administrator would have to understand they need to go explicitly grant said permission to the TSB SA (rather than just granting it to the new class of user). And they're not going to realize that (even if we doc it) until it doesn't work. And then they're going to forget to do it when they stand up their second cluster. It's "just one more thing" on top of the way too many things we already have.

Right, as long as the TSB has no bugs and there no unforeseen paths to cause it misbehave (or gain access to its token), it is perfectly fine. I want less components with root on the cluster, not more.

I know, and ideally you'd like the cluster to be on a hard drive that has been ripped out of the PC and melted down for scrap and that would make all the potential security holes go away. But we also have to provide a good user experience.

@deads2k
Copy link
Contributor

deads2k commented Apr 18, 2018

merging this allows the creation of CRDs, but GC is still going to do weird things.

Why has this become an objective? Allowing the creation of CRDs takes the templateservicebroker from a user focused tool to a cluster-admin focused tool.

@bparees
Copy link
Contributor Author

bparees commented Apr 18, 2018

Why has this become an objective? Allowing the creation of CRDs takes the templateservicebroker from a user focused tool to a cluster-admin focused tool.

the ASB team seems to have a need to do this (see #19368 which started all of this) but even if we don't want to create CRDs, we do need to be able to create arbitrary resource types that might be introduced in the future.

@@ -284,6 +297,10 @@ func (c *TemplateInstanceController) Run(workers int, stopCh <-chan struct{}) {
go wait.Until(c.runWorker, time.Second, stopCh)
}

go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double goroutines is odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut+paste error, fixed.

@@ -10646,7 +10648,7 @@ objects:
ports:
- name: port
port: 1234
- apiVersion: v1
- apiVersion: route.openshift.io/v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still want a v1 openshift object that would need to be submitted to /oapi (service is kube)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a groupified openshift object

@@ -59,6 +61,31 @@ objects:
to:
kind: Service
name: service
- apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until the cluster-scoped->namespaced and cross-namespace ownerRef issues are settled, the TSB shouldn't set up objects that result in those invalid relationships

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair, but i'd kinda rather put the test in place now (the test passes) while i'm in here.

arguably we don't want to merge this PR at all until we sort out the GC question of what happens if templateinstance creates a CRD since we're just setting people up for undefined behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wanted to put code in place to prevent those relationships (the TSB will happily set them up now, and they don't work, and I don't anticipate ownerRef API changes to support them in 3.10 timeframe)

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 18, 2018
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 18, 2018
@enj
Copy link
Contributor

enj commented Apr 18, 2018

@openshift/sig-security

@bparees
Copy link
Contributor Author

bparees commented Apr 18, 2018

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2018
@simo5
Copy link
Contributor

simo5 commented Apr 19, 2018

Is there somewhere a write up of the reasons why TSB is getting into this business ?
We should really figure out before any more work in this directionis done in TSB or elsewhere.

@bparees
Copy link
Contributor Author

bparees commented Apr 19, 2018

Is there somewhere a write up of the reasons why TSB is getting into this business ?

The TSB is already in this business. The TSB is an api for creating resources on your cluster. The fact that there are resources it cannot create is, imho, a bug.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 19, 2018
@bparees
Copy link
Contributor Author

bparees commented Apr 27, 2018

/retest

1 similar comment
@bparees
Copy link
Contributor Author

bparees commented Apr 27, 2018

/retest

@bparees
Copy link
Contributor Author

bparees commented Apr 27, 2018

@deads2k ok it passed w/ the extra wrapper deleted. final review please?

@bparees
Copy link
Contributor Author

bparees commented Apr 28, 2018

/retest

1 similar comment
@bparees
Copy link
Contributor Author

bparees commented Apr 28, 2018

/retest

restmapper meta.RESTMapper
config *rest.Config
templateClient templateclient.Interface
restmapper meta.RESTMapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have two restmappers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below, we're using a different one for readiness checking. possibly it should all be switched to the dynamic mapper + dynamic(json?) client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below, we're using a different one for readiness checking. possibly it should all be switched to the dynamic mapper + dynamic(json?) client.

You can keep the extra config, but onlyi one restmapper. Compiled restmappers are being removed in 1.11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restmapper removed. dynamic mapping for everyone.

templateClient templateclient.Interface
restmapper meta.RESTMapper
dynamicRestMapper *discovery.DeferredDiscoveryRESTMapper
config *rest.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have two configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is for json and one is not... we added the json one to handle unstructured content. the bulk mapper for checkReadiness is still using the non-json one:

		cli, err := bulk.ClientMapperFromConfig(c.config).ClientForMapping(mapping)

is it safe to switch it to the json based config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additional context on what we do w/ the client we retrieve:

		obj, err := cli.Get().Resource(mapping.Resource).NamespaceIfScoped(object.Ref.Namespace, mapping.Scope.Name() == meta.RESTScopeNameNamespace).Name(object.Ref.Name).Do().Get()

@bparees
Copy link
Contributor Author

bparees commented Apr 30, 2018

extra mapper removed in new commit

@bparees
Copy link
Contributor Author

bparees commented Apr 30, 2018

/retest

1 similar comment
@bparees
Copy link
Contributor Author

bparees commented May 1, 2018

/retest

@bparees
Copy link
Contributor Author

bparees commented May 2, 2018

/retest

1 similar comment
@bparees
Copy link
Contributor Author

bparees commented May 2, 2018

/retest

config *rest.Config
templateClient templateclient.Interface
dynamicRestMapper meta.RESTMapper
config *rest.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this can be done with a codec against the dynamic client and can remove the kubectl dependencies this places in our controllers. Doesn't have to be done here. A TODO would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added todo

@deads2k
Copy link
Contributor

deads2k commented May 3, 2018

/lgtm

might want to squash the commit named for me.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2018
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label May 3, 2018
@openshift-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@bparees
Copy link
Contributor Author

bparees commented May 3, 2018

added todo and squashed, re-applying lgtm

@bparees bparees added the lgtm Indicates that a PR is ready to be merged. label May 3, 2018
@bparees
Copy link
Contributor Author

bparees commented May 3, 2018

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 3, 2018
@bparees
Copy link
Contributor Author

bparees commented May 3, 2018

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented May 4, 2018

@bparees: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/unit fc9893b link /test unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit d859add into openshift:master May 4, 2018
@bparees bparees deleted the template_crds branch May 19, 2018 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. sig/security size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Template Service Broker fails to provision CRDs
8 participants