-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make dockercfg generation wait for the token secret using shared informer #11394
Make dockercfg generation wait for the token secret using shared informer #11394
Conversation
[test] |
[test] flake: #11381 |
@Kargakis ptal as well |
@@ -85,6 +90,8 @@ func NewDockercfgController(cl client.Interface, options DockercfgControllerOpti | |||
e.serviceAccountCache = NewEtcdMutationCache(serviceAccountCache) | |||
e.syncHandler = e.syncServiceAccount | |||
|
|||
e.addSecretInformer(options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this deserves its own method. Inline the informer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if others like it, we should probably do the same for the sa informer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can move it up, was trying to keep the new function small
func (e *DockercfgController) addSecretInformer(options DockercfgControllerOptions) { | ||
saTokenSelector := fields.OneTermEqualSelector(api.SecretTypeField, string(api.SecretTypeServiceAccountToken)) | ||
|
||
_, e.secretController = framework.NewInformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be Indexed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
|
||
queue workqueue.RateLimitingInterface | ||
|
||
// syncHandler does the work. It's factored out for unit testing | ||
syncHandler func(serviceKey string) error | ||
} | ||
|
||
func (e *DockercfgController) secretTokenAvailable(_, newObj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
Needs a unit test |
|
||
queue workqueue.RateLimitingInterface | ||
|
||
// syncHandler does the work. It's factored out for unit testing | ||
syncHandler func(serviceKey string) error | ||
} | ||
|
||
func (e *DockercfgController) secretTokenAvailable(_, newObj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen on an Add
instead of on a Update
if your controller restarts after having been off. See kubernetes/kubernetes#32778
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddFunc added
if _, isPending := secret.Annotations[PendingDockercfgNameKey]; !isPending { | ||
return | ||
} | ||
if data, exists := secret.Data[api.ServiceAccountTokenKey]; exists && len(data) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this logic should run synchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think I see what you mean... when I create the dockerPullSecret and now observing the "addFunc", we should detect that and call addSecretToServiceAccount.
@@ -312,50 +392,18 @@ func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccou | |||
Annotations: map[string]string{ | |||
api.ServiceAccountNameKey: serviceAccount.Name, | |||
api.ServiceAccountUIDKey: string(serviceAccount.UID), | |||
PendingDockercfgNameKey: "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add a pending annotation we have to remove? why not just annotate this to indicate it was created by / owned by this controller, and we can leave the annotation in place? we can tell if a token is pending by the lack of a token in the data map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add a pending annotation we have to remove? why not just annotate this to indicate it was created by / owned by this controller, and we can leave the annotation in place? we can tell if a token is pending by the lack of a token in the data map
Ah controllerRef, how difficult you are. It's owned by this controller, not the controller that fills in the secret? It's owned by both? It's an ownerRef pointing to an object that doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt agree, I don't think we need any annotation as we can just check if the state is pending by looking into secret data map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I was wrong :-) we need that annotation to make sure the pull secret is not created everytime the "handleSecrets" function run, so we don't end up with:
@dev] .../openshift/origin # oc get secrets -n test
NAME TYPE DATA AGE
builder-dockercfg-0sk6b kubernetes.io/dockercfg 1 1m
builder-dockercfg-8uao3 kubernetes.io/dockercfg 1 1m
builder-dockercfg-c35kr kubernetes.io/dockercfg 1 1m
builder-dockercfg-n1vzw kubernetes.io/dockercfg 1 1m
builder-dockercfg-oiv6v kubernetes.io/dockercfg 1 1m
builder-dockercfg-qpyi1 kubernetes.io/dockercfg 1 1m
builder-dockercfg-tefb1 kubernetes.io/dockercfg 1 1m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I found that this is because I also generate a lot of tokens (which I think is because changes to syncServiceAccount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvmd. pebkac... I had wrong field selector in the shared informer...
b9e07fa
to
1bd2e78
Compare
mountableDockercfgSecrets, imageDockercfgPullSecrets := getGeneratedDockercfgSecretNames(serviceAccount) | ||
|
||
// look for an ImagePullSecret in the form | ||
if len(imageDockercfgPullSecrets) > 0 && len(mountableDockercfgSecrets) > 0 { | ||
if len(imageDockercfgPullSecrets) > 0 || len(mountableDockercfgSecrets) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this... needsDockercfgSecret
means we have to do work on the service account, even if that work is just copying an existing dockercfg from one list to the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt i think then I will need new method to ensure the dockercfg was created already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
1bd2e78
to
00e07a6
Compare
this is turning into a rewrite of the controller, which is not what we want
when we decide we need a token for a service account:
when we see an update to a token in the list we're waiting for, and it is populated, sync the service account again (will trigger case 2 above) when we see a token get deleted, if it is in the list we're waiting for, remove it from the list and sync the service account again (will trigger case 3 above) |
@liggitt i think this PR is doing what you're suggesting, I don't know if it is possible without rewriting the controller. |
return err | ||
} | ||
|
||
dockercfgSecret, err := e.createDockerPullSecret(serviceAccount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the only change to syncServiceAccount
to be to short-circuit if createDockerPullSecret
says it didn't create the secret:
dockercfgSecret, created, err := e.createDockerPullSecret(serviceAccount)
if err != nil {
return err
}
if !created {
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will the later population of the token trigger the service account resync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, that's why you need the list of token secrets you're waiting on. the secret watcher will resync the relevant service account if one of those secrets gets populated or deleted
} | ||
|
||
// createDockerPullSecret creates a dockercfg secret based on the token secret | ||
func (e *DockercfgController) createDockerPullSecret(serviceAccount *api.ServiceAccount) (*api.Secret, error) { | ||
func (e *DockercfgController) createDockerPullSecret(tokenSecret *api.Secret, serviceAccount *api.ServiceAccount) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the only changes to createDockerPullSecret
would be to short-circuit if createTokenSecret
says it didn't create a populated secret:
tokenSecret, isPopulated, err := e.createTokenSecret(serviceAccount)
if err != nil {
return nil, false, err
}
if !isPopulated {
return nil, false, nil
}
and to only return true
for created if it makes it to the end of the function
all the logic changes should be able to be contained to the secret watcher (which should only be watching service account token secrets, and needs to handle secret deletion), and the |
@liggitt the token secret deletion is handled already in separate controller, do we want to get rid of that controller? |
no, this observer's job is not to go clean up the dockercfg, it is to react to population and deletion of pending-population tokens |
ffa95f6
to
906c616
Compare
} | ||
serviceAccountName, _ := secret.Annotations[api.ServiceAccountNameKey] | ||
serviceAccountKey := makeServiceAccountKey(secret.Namespace, serviceAccountName) | ||
obj, exists, err := e.serviceAccountCache.GetByKey(serviceAccountKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than looking up the service account in the cache, just add to the queue directly?
e.queue.Add(serviceAccountKey)
return | ||
} | ||
} | ||
if createdBy := secret.Annotations[api.CreatedByAnnotation]; createdBy != CreateDockercfgSecretsController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if secret.Annotations[api.CreatedByAnnotation] != CreateDockercfgSecretsController
serviceAccount := obj.(*api.ServiceAccount) | ||
// The secret is now available in the cache so remove it from pendingTokens | ||
e.pendingTokens.Delete(serviceAccount) | ||
if len(secret.Data[api.ServiceAccountTokenKey]) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only re-queue if we are observing the update where the old secret is missing the token and the new secret has it?
if len(secret.Data[api.ServiceAccountTokenKey]) > 0 { | ||
return | ||
} | ||
serviceAccountName, _ := secret.Annotations[api.ServiceAccountNameKey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enqueue the key directly
} | ||
for _, obj := range secrets { | ||
secret := obj.(*api.Secret) | ||
if createdBy := secret.Annotations[api.CreatedByAnnotation]; createdBy == CreateDockercfgSecretsController { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if secret.Annotations[api.CreatedByAnnotation] == CreateDockercfgSecretsController
func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccount) (*api.Secret, error) { | ||
tokenSecret := &api.Secret{ | ||
func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccount) (*api.Secret, bool, error) { | ||
tokenSecret, isPending, err := e.getTokenSecretFromCache(serviceAccount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure looking up the token from cache here is a race:
- service account exists, token exists, dockercfg exists, serviceaccount has reference, caches are populated
- token gets deleted
- deleted_token_secrets.go deletes the dockercfg in response
- deleted_dockercfg_secrets.go removes the reference from the service account in response
- service account update triggers this controller to resync the service account. this controller's token cache could still show the populated token that was deleted in step 2, and it would re-create a dockercfg secret now containing a defunct token and add it to the serviceaccount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt should we always do a live query then to get the live token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
live lookups don't do what you'd expect... I think we can do a minimal change in the state machine direction without a complete rewrite
b237014
to
5a7903f
Compare
return err | ||
} | ||
if !created { | ||
glog.V(2).Infof("The dockercfg secret was not created for service account %s/%s, will retry", serviceAccount.Namespace, serviceAccount.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower verbosity to 5... this is going to happen every time now
if err != nil { | ||
glog.V(2).Infof("Error during adding the dockercfg secret to %s/%s: %v", serviceAccount.Namespace, serviceAccount.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to log here, error gets logged at top level on error returns
_, err := e.client.Secrets(tokenSecret.Namespace).Create(tokenSecret) | ||
if err != nil { | ||
return nil, err | ||
glog.V(2).Infof("Creating token secret %q for service account %s/%s", tokenSecret.Name, serviceAccount.Namespace, serviceAccount.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V(4)
return nil, false, err | ||
} | ||
if !isPopulated { | ||
glog.V(4).Infof("Token secret for service account %s/%s is not populated yet", serviceAccount.Namespace, serviceAccount.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V(5)
5a7903f
to
87e625e
Compare
@liggitt log levels adjusted, manually tested, lets give it one more [test] run |
LGTM but I'd like a second-check from @deads2k |
// attached to all token secrets this controller create | ||
CreateDockercfgSecretsController = "openshift.io/create-dockercfg-secrets" | ||
|
||
// PendingTokenAnnotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
e.serviceAccountCache = NewEtcdMutationCache(serviceAccountCache) | ||
|
||
tokenSecretSelector := fields.OneTermEqualSelector(api.SecretTypeField, string(api.SecretTypeServiceAccountToken)) | ||
e.secretCache, e.secretController = framework.NewInformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need this to be indexed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secrets for a service account are always in the same namespace. So it seems that you can index by namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, we're looking up by key only (implicit index)
87e625e
to
26e69cf
Compare
} | ||
dockercfgSecret.Data[api.DockerConfigKey] = dockercfgContent | ||
|
||
// Save the secret | ||
createdSecret, err := e.client.Secrets(tokenSecret.Namespace).Create(dockercfgSecret) | ||
if err != nil { | ||
// Clean up the generated token secret if we're not going to use it | ||
glog.V(2).Infof("deleting unused token secret %s/%s, error creating dockercfgSecret: %v", tokenSecret.Namespace, tokenSecret.Name, err) | ||
glog.V(2).Infof("Deleting unused token secret %s/%s, error creating dockercfgSecret: %v", tokenSecret.Namespace, tokenSecret.Name, err) | ||
if deleteErr := e.client.Secrets(tokenSecret.Namespace).Delete(tokenSecret.Name); (deleteErr != nil) && !kapierrors.IsNotFound(deleteErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what conditions is this still necessary? Since we have the name of the token secret we created, we ought to be able to use it to back the docker pull secret instead of creating another one right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will remove
One questions/comment. lgtm otherwise. |
26e69cf
to
6744424
Compare
Evaluated for origin test up to 6744424 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10599/) (Image: devenv-rhel7_5239) |
Evaluated for origin merge up to 6744424 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10599/) (Base Commit: 66847bb) |
Fixes: #11333
Should fix: https://bugzilla.redhat.com/show_bug.cgi?id=1374091