-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow template instance controller to create CRDs #19396
Conversation
@@ -47,7 +49,7 @@ objects: | |||
ports: | |||
- name: port | |||
port: 1234 | |||
- apiVersion: v1 | |||
- apiVersion: route.openshift.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt I think this template covers all the variants you were concerned w/ now:
- openshift api object by group (route)
- openshift api object by legacy (service)
- k8s api object by standard api (configmap)
- k8s api object by group (CRD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openshift api object by legacy (service)
service is a kube object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah dunno what i was thinking. updated w/ a second route object that uses the legacy api.
rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), | ||
rbac.NewRule("get").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), | ||
// delete needed to be able to set ownerrefs on objects | ||
rbac.NewRule("delete").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton @liggitt @deads2k let the debate begin...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For blockOwnerDeletion? I thought it was update of */finalizers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error was something like you can't set an owner ref on a resource type you don't have the power to delete. Made sense to me since setting an owner ref indirectly allows you to delete the object later (assuming you can delete the thing the owner ref points to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense on an update, I didn't realize that was required to set an ownerRef on create. Is that what you're seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so but I didn't look at it super closely. (given that I didn't grant the controller update permissions, though, it seems like it would have to have been on create)
rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), | ||
rbac.NewRule("get").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), | ||
// delete needed to be able to set ownerrefs on objects | ||
rbac.NewRule("delete").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about #19404 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that seems like it would work for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for delete specifically. still need create+get though)
@@ -163,6 +163,11 @@ func init() { | |||
rbac.NewRule("create").Groups(kAuthzGroup).Resources("subjectaccessreviews").RuleOrDie(), | |||
rbac.NewRule("update").Groups(templateGroup).Resources("templateinstances/status").RuleOrDie(), | |||
rbac.NewRule("update").Groups(templateGroup).Resources("templateinstances/finalizers").RuleOrDie(), | |||
// template instance controller needs to be able to create and get any resource the user is allowed to create. | |||
rbac.NewRule("create").Groups(rbac.APIGroupAll).Resources(rbac.ResourceAll).RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's broad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this allows escalation flows like:
- user authorized to
create builds
can now create custom builds (user passescreate builds
SAR check, template service broker passescreate builds/custom
check during actual create) - user authorized to
create roles
orcreate rolebindings
can now create or bind a role containing any creation permission (user passescreate roles
SAR check, template service broker passes RBAC escalation check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like i said, let the debate begin :)
the alternative I can see is to switch to using impersonation instead of SAR checks, which also has the advantage of fixing some other potential issues @liggitt noted, where the user might have permission to create X, but isn't allowed to create X when field Y is Z (e.g. privileged pods). Today that's not an issue since the controller SA can't create that either, but if we ever have a case where the controller SA can pass a particular admission control that the user can't pass, the SAR check isn't sufficient anymore and impersonation would be a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is a good target for cluster role aggregation. It wouldn't eliminate cases like jordan indicated, but at least you wouldn't accidentally have the power to create a node or an oauthtoken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not familiar w/ cluster role aggregation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean granting those new permissions to everyone with "admin". That doesn't seem like a desirable model. (Nor, in general, does requiring the cluster-admin to go modify permissions to make Templates do what they are supposed to be able to do, strike me as very user friendly).
Also what is behind the dislike of impersonation? It seems like a better solution than what we've done already, nevermind the growing complexity/limitations as we are now seeing as we try to extend it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would mean granting those new permissions to everyone with "admin". That doesn't seem like a desirable model.
That is the only model that makes any sense from a normal user perspective. Consider if a broker creates a CRD and no user has access to it. The CRD is basically useless at that point. Without aggregation, there is also no way for the user to create a role / binding to the CRD. Thus they basically have to use the TSB as an escalation enabler.
(Nor, in general, does requiring the cluster-admin to go modify permissions to make Templates do what they are supposed to be able to do, strike me as very user friendly).
Seems like the exact UX I would want as a cluster admin. Normal users do not get to "install" new APIs into the cluster and say they "own" those new APIs. The cluster admin gets to opt-in to that. Afterwards, users are free to create instances of those CRDs just like an other resource they have access to.
Also what is behind the dislike of impersonation? It seems like a better solution than what we've done already, nevermind the growing complexity/limitations as we are now seeing as we try to extend it.
Because it turns the TSB into a component that is an indirect cluster admin (forever). In general we do not have a good way to state "do X as user Y but do not let me escalate into permissions I do not already have - i.e. make sure RBAC / admission / etc would let both of us do this thing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the only model that makes any sense from a normal user perspective. Consider if a broker creates a CRD and no user has access to it. The CRD is basically useless at that point.
Not if i only want some subset of my users to have access to it.
Seems like the exact UX I would want as a cluster admin. Normal users do not get to "install" new APIs into the cluster and say they "own" those new APIs. The cluster admin gets to opt-in to that. Afterwards, users are free to create instances of those CRDs just like an other resource they have access to.
Again you're assuming I want all users in my cluster to be able to do this thing. If I don't want that, then I need to grant the TSB a different role (not admin) so i can grant access on a more fine grained level. This is a mess. The TSB should not need special permissions to create things, it should be able to create anything the user is allowed to create. Hence why impersonation is the right solution.
Because it turns the TSB into a component that is an indirect cluster admin (forever). In
So? As long as the TSB doesn't take any action that isn't impersonating the user requesting the action, who cares?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if i only want some subset of my users to have access to it.
If you want a more nuanced flow, then I am fine with you having to do it manually. I agree with @deads2k's assessment that the TSB is a a user focused tool
and not a cluster-admin focused tool
.
So? As long as the TSB doesn't take any action that isn't impersonating the user requesting the action, who cares?
Right, as long as the TSB has no bugs and there no unforeseen paths to cause it misbehave (or gain access to its token), it is perfectly fine. I want less components with root
on the cluster, not more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a more nuanced flow, then I am fine with you having to do it manually.
I understand where you're coming from, but here's how that statement comes across to me: "i'm fine with @bparees spending his day explaining to customers how to make this work because we wouldn't let him/couldn't give him a way to do it that would have just worked out of the box."
I agree with @deads2k's assessment that the TSB is a a user focused tool and not a cluster-admin focused tool.
It's precisely because it's user focused that a cluster-admin should not have to go manage the TSB to enable a user to do things they can already do with their user account. The TSB is just another api for creating resources.
It's pretty easy for me to envision a cluster administrator creating a class of users who can create a particular CRD instance(e.g. people allowed to instantiate a particular operator), and those users wanting to be able to create those CRD instances via a template. With your proposal the administrator would have to understand they need to go explicitly grant said permission to the TSB SA (rather than just granting it to the new class of user). And they're not going to realize that (even if we doc it) until it doesn't work. And then they're going to forget to do it when they stand up their second cluster. It's "just one more thing" on top of the way too many things we already have.
Right, as long as the TSB has no bugs and there no unforeseen paths to cause it misbehave (or gain access to its token), it is perfectly fine. I want less components with root on the cluster, not more.
I know, and ideally you'd like the cluster to be on a hard drive that has been ripped out of the PC and melted down for scrap and that would make all the potential security holes go away. But we also have to provide a good user experience.
Why has this become an objective? Allowing the creation of CRDs takes the templateservicebroker from a user focused tool to a cluster-admin focused tool. |
the ASB team seems to have a need to do this (see #19368 which started all of this) but even if we don't want to create CRDs, we do need to be able to create arbitrary resource types that might be introduced in the future. |
@@ -284,6 +297,10 @@ func (c *TemplateInstanceController) Run(workers int, stopCh <-chan struct{}) { | |||
go wait.Until(c.runWorker, time.Second, stopCh) | |||
} | |||
|
|||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double goroutines is odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut+paste error, fixed.
@@ -10646,7 +10648,7 @@ objects: | |||
ports: | |||
- name: port | |||
port: 1234 | |||
- apiVersion: v1 | |||
- apiVersion: route.openshift.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still want a v1 openshift object that would need to be submitted to /oapi (service is kube)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a groupified openshift object
@@ -59,6 +61,31 @@ objects: | |||
to: | |||
kind: Service | |||
name: service | |||
- apiVersion: apiextensions.k8s.io/v1beta1 | |||
kind: CustomResourceDefinition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until the cluster-scoped->namespaced and cross-namespace ownerRef issues are settled, the TSB shouldn't set up objects that result in those invalid relationships
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair, but i'd kinda rather put the test in place now (the test passes) while i'm in here.
arguably we don't want to merge this PR at all until we sort out the GC question of what happens if templateinstance creates a CRD since we're just setting people up for undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wanted to put code in place to prevent those relationships (the TSB will happily set them up now, and they don't work, and I don't anticipate ownerRef API changes to support them in 3.10 timeframe)
@openshift/sig-security |
/hold |
Is there somewhere a write up of the reasons why TSB is getting into this business ? |
The TSB is already in this business. The TSB is an api for creating resources on your cluster. The fact that there are resources it cannot create is, imho, a bug. |
/retest |
1 similar comment
/retest |
@deads2k ok it passed w/ the extra wrapper deleted. final review please? |
/retest |
1 similar comment
/retest |
restmapper meta.RESTMapper | ||
config *rest.Config | ||
templateClient templateclient.Interface | ||
restmapper meta.RESTMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have two restmappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as below, we're using a different one for readiness checking. possibly it should all be switched to the dynamic mapper + dynamic(json?) client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as below, we're using a different one for readiness checking. possibly it should all be switched to the dynamic mapper + dynamic(json?) client.
You can keep the extra config, but onlyi one restmapper. Compiled restmappers are being removed in 1.11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restmapper removed. dynamic mapping for everyone.
templateClient templateclient.Interface | ||
restmapper meta.RESTMapper | ||
dynamicRestMapper *discovery.DeferredDiscoveryRESTMapper | ||
config *rest.Config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have two configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one is for json and one is not... we added the json one to handle unstructured content. the bulk mapper for checkReadiness is still using the non-json one:
cli, err := bulk.ClientMapperFromConfig(c.config).ClientForMapping(mapping)
is it safe to switch it to the json based config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional context on what we do w/ the client we retrieve:
obj, err := cli.Get().Resource(mapping.Resource).NamespaceIfScoped(object.Ref.Namespace, mapping.Scope.Name() == meta.RESTScopeNameNamespace).Name(object.Ref.Name).Do().Get()
extra mapper removed in new commit |
/retest |
1 similar comment
/retest |
/retest |
1 similar comment
/retest |
config *rest.Config | ||
templateClient templateclient.Interface | ||
dynamicRestMapper meta.RESTMapper | ||
config *rest.Config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this can be done with a codec against the dynamic client and can remove the kubectl dependencies this places in our controllers. Doesn't have to be done here. A TODO would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added todo
/lgtm might want to squash the commit named for me. |
New changes are detected. LGTM label has been removed. |
added todo and squashed, re-applying lgtm |
/hold cancel |
/retest |
@bparees: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Still need to resolve https://bugzilla.redhat.com/show_bug.cgi?id=1568607
merging this allows the creation of CRDs, but GC is still going to do weird things.
fixes #19368