Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dockercfg generation wait for the token secret using shared informer #11394

Merged
merged 1 commit into from
Oct 25, 2016

Conversation

mfojtik
Copy link
Contributor

@mfojtik mfojtik commented Oct 17, 2016

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 17, 2016

[test]

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 17, 2016

@liggitt @deads2k PTAL

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 18, 2016

[test]

flake: #11381

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 18, 2016

@Kargakis ptal as well

@mfojtik mfojtik changed the title WIP: serviceaccount: add secret informer to create_dockercfg_secret serviceaccount: add secret informer to create_dockercfg_secret Oct 18, 2016
@mfojtik mfojtik changed the title serviceaccount: add secret informer to create_dockercfg_secret Make dockercfg generation wait for the token secret using shared informer Oct 18, 2016
@@ -85,6 +90,8 @@ func NewDockercfgController(cl client.Interface, options DockercfgControllerOpti
e.serviceAccountCache = NewEtcdMutationCache(serviceAccountCache)
e.syncHandler = e.syncServiceAccount

e.addSecretInformer(options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this deserves its own method. Inline the informer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if others like it, we should probably do the same for the sa informer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can move it up, was trying to keep the new function small

func (e *DockercfgController) addSecretInformer(options DockercfgControllerOptions) {
saTokenSelector := fields.OneTermEqualSelector(api.SecretTypeField, string(api.SecretTypeServiceAccountToken))

_, e.secretController = framework.NewInformer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Indexed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?


queue workqueue.RateLimitingInterface

// syncHandler does the work. It's factored out for unit testing
syncHandler func(serviceKey string) error
}

func (e *DockercfgController) secretTokenAvailable(_, newObj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@0xmichalis
Copy link
Contributor

Needs a unit test


queue workqueue.RateLimitingInterface

// syncHandler does the work. It's factored out for unit testing
syncHandler func(serviceKey string) error
}

func (e *DockercfgController) secretTokenAvailable(_, newObj interface{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen on an Add instead of on a Update if your controller restarts after having been off. See kubernetes/kubernetes#32778

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddFunc added

if _, isPending := secret.Annotations[PendingDockercfgNameKey]; !isPending {
return
}
if data, exists := secret.Data[api.ServiceAccountTokenKey]; exists && len(data) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic should run synchronously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think I see what you mean... when I create the dockerPullSecret and now observing the "addFunc", we should detect that and call addSecretToServiceAccount.

@@ -312,50 +392,18 @@ func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccou
Annotations: map[string]string{
api.ServiceAccountNameKey: serviceAccount.Name,
api.ServiceAccountUIDKey: string(serviceAccount.UID),
PendingDockercfgNameKey: "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add a pending annotation we have to remove? why not just annotate this to indicate it was created by / owned by this controller, and we can leave the annotation in place? we can tell if a token is pending by the lack of a token in the data map

Copy link
Contributor

@deads2k deads2k Oct 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add a pending annotation we have to remove? why not just annotate this to indicate it was created by / owned by this controller, and we can leave the annotation in place? we can tell if a token is pending by the lack of a token in the data map

Ah controllerRef, how difficult you are. It's owned by this controller, not the controller that fills in the secret? It's owned by both? It's an ownerRef pointing to an object that doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt agree, I don't think we need any annotation as we can just check if the state is pending by looking into secret data map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @liggitt I rewrote that add/update logic so the annotation is no longer needed and is is much cleaner now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was wrong :-) we need that annotation to make sure the pull secret is not created everytime the "handleSecrets" function run, so we don't end up with:

@dev] .../openshift/origin # oc get secrets -n test
NAME                       TYPE                                  DATA      AGE
builder-dockercfg-0sk6b    kubernetes.io/dockercfg               1         1m
builder-dockercfg-8uao3    kubernetes.io/dockercfg               1         1m
builder-dockercfg-c35kr    kubernetes.io/dockercfg               1         1m
builder-dockercfg-n1vzw    kubernetes.io/dockercfg               1         1m
builder-dockercfg-oiv6v    kubernetes.io/dockercfg               1         1m
builder-dockercfg-qpyi1    kubernetes.io/dockercfg               1         1m
builder-dockercfg-tefb1    kubernetes.io/dockercfg               1         1m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I found that this is because I also generate a lot of tokens (which I think is because changes to syncServiceAccount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvmd. pebkac... I had wrong field selector in the shared informer...

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch 4 times, most recently from b9e07fa to 1bd2e78 Compare October 19, 2016 14:14
@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 19, 2016

@deads2k @liggitt updated and ready for next review :-)

mountableDockercfgSecrets, imageDockercfgPullSecrets := getGeneratedDockercfgSecretNames(serviceAccount)

// look for an ImagePullSecret in the form
if len(imageDockercfgPullSecrets) > 0 && len(mountableDockercfgSecrets) > 0 {
if len(imageDockercfgPullSecrets) > 0 || len(mountableDockercfgSecrets) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this... needsDockercfgSecret means we have to do work on the service account, even if that work is just copying an existing dockercfg from one list to the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt i think then I will need new method to ensure the dockercfg was created already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch from 1bd2e78 to 00e07a6 Compare October 19, 2016 15:50
@liggitt
Copy link
Contributor

liggitt commented Oct 19, 2016

this is turning into a rewrite of the controller, which is not what we want

  1. track the tokens we've created that are waiting to be populated in a local variable
  2. watch service account token secrets

when we decide we need a token for a service account:

  1. if we are already waiting for a token and it is not yet populated, return early
  2. if we are already waiting for a token and it is populated, continue to build and associate the dockercfg
  3. if we are not already waiting for a token, create one and add it to the list we're waiting for

when we see an update to a token in the list we're waiting for, and it is populated, sync the service account again (will trigger case 2 above)

when we see a token get deleted, if it is in the list we're waiting for, remove it from the list and sync the service account again (will trigger case 3 above)

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 19, 2016

@liggitt i think this PR is doing what you're suggesting, I don't know if it is possible without rewriting the controller.

return err
}

dockercfgSecret, err := e.createDockerPullSecret(serviceAccount)
Copy link
Contributor

@liggitt liggitt Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the only change to syncServiceAccount to be to short-circuit if createDockerPullSecret says it didn't create the secret:

dockercfgSecret, created, err := e.createDockerPullSecret(serviceAccount)
if err != nil {
  return err
}
if !created  {
  return nil
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the later population of the token trigger the service account resync?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, that's why you need the list of token secrets you're waiting on. the secret watcher will resync the relevant service account if one of those secrets gets populated or deleted

}

// createDockerPullSecret creates a dockercfg secret based on the token secret
func (e *DockercfgController) createDockerPullSecret(serviceAccount *api.ServiceAccount) (*api.Secret, error) {
func (e *DockercfgController) createDockerPullSecret(tokenSecret *api.Secret, serviceAccount *api.ServiceAccount) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the only changes to createDockerPullSecret would be to short-circuit if createTokenSecret says it didn't create a populated secret:

tokenSecret, isPopulated, err := e.createTokenSecret(serviceAccount)
if err != nil {
  return nil, false, err
}
if !isPopulated {
  return nil, false, nil
}

and to only return true for created if it makes it to the end of the function

@liggitt
Copy link
Contributor

liggitt commented Oct 19, 2016

all the logic changes should be able to be contained to the secret watcher (which should only be watching service account token secrets, and needs to handle secret deletion), and the createTokenSecret function

@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 19, 2016

@liggitt the token secret deletion is handled already in separate controller, do we want to get rid of that controller?

@liggitt
Copy link
Contributor

liggitt commented Oct 19, 2016

@liggitt the token secret deletion is handled already in separate controller, do we want to get rid of that controller?

no, this observer's job is not to go clean up the dockercfg, it is to react to population and deletion of pending-population tokens

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch 2 times, most recently from ffa95f6 to 906c616 Compare October 20, 2016 12:26
}
serviceAccountName, _ := secret.Annotations[api.ServiceAccountNameKey]
serviceAccountKey := makeServiceAccountKey(secret.Namespace, serviceAccountName)
obj, exists, err := e.serviceAccountCache.GetByKey(serviceAccountKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than looking up the service account in the cache, just add to the queue directly?

e.queue.Add(serviceAccountKey)

return
}
}
if createdBy := secret.Annotations[api.CreatedByAnnotation]; createdBy != CreateDockercfgSecretsController {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if secret.Annotations[api.CreatedByAnnotation] != CreateDockercfgSecretsController

serviceAccount := obj.(*api.ServiceAccount)
// The secret is now available in the cache so remove it from pendingTokens
e.pendingTokens.Delete(serviceAccount)
if len(secret.Data[api.ServiceAccountTokenKey]) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only re-queue if we are observing the update where the old secret is missing the token and the new secret has it?

if len(secret.Data[api.ServiceAccountTokenKey]) > 0 {
return
}
serviceAccountName, _ := secret.Annotations[api.ServiceAccountNameKey]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enqueue the key directly

}
for _, obj := range secrets {
secret := obj.(*api.Secret)
if createdBy := secret.Annotations[api.CreatedByAnnotation]; createdBy == CreateDockercfgSecretsController {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if secret.Annotations[api.CreatedByAnnotation] == CreateDockercfgSecretsController

func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccount) (*api.Secret, error) {
tokenSecret := &api.Secret{
func (e *DockercfgController) createTokenSecret(serviceAccount *api.ServiceAccount) (*api.Secret, bool, error) {
tokenSecret, isPending, err := e.getTokenSecretFromCache(serviceAccount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure looking up the token from cache here is a race:

  1. service account exists, token exists, dockercfg exists, serviceaccount has reference, caches are populated
  2. token gets deleted
  3. deleted_token_secrets.go deletes the dockercfg in response
  4. deleted_dockercfg_secrets.go removes the reference from the service account in response
  5. service account update triggers this controller to resync the service account. this controller's token cache could still show the populated token that was deleted in step 2, and it would re-create a dockercfg secret now containing a defunct token and add it to the serviceaccount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt should we always do a live query then to get the live token?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

live lookups don't do what you'd expect... I think we can do a minimal change in the state machine direction without a complete rewrite

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch 2 times, most recently from b237014 to 5a7903f Compare October 24, 2016 13:38
return err
}
if !created {
glog.V(2).Infof("The dockercfg secret was not created for service account %s/%s, will retry", serviceAccount.Namespace, serviceAccount.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower verbosity to 5... this is going to happen every time now

if err != nil {
glog.V(2).Infof("Error during adding the dockercfg secret to %s/%s: %v", serviceAccount.Namespace, serviceAccount.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to log here, error gets logged at top level on error returns

_, err := e.client.Secrets(tokenSecret.Namespace).Create(tokenSecret)
if err != nil {
return nil, err
glog.V(2).Infof("Creating token secret %q for service account %s/%s", tokenSecret.Name, serviceAccount.Namespace, serviceAccount.Name)
Copy link
Contributor

@liggitt liggitt Oct 24, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(4)

return nil, false, err
}
if !isPopulated {
glog.V(4).Infof("Token secret for service account %s/%s is not populated yet", serviceAccount.Namespace, serviceAccount.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(5)

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch from 5a7903f to 87e625e Compare October 24, 2016 14:10
@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 24, 2016

@liggitt log levels adjusted, manually tested, lets give it one more [test] run

@liggitt
Copy link
Contributor

liggitt commented Oct 24, 2016

LGTM but I'd like a second-check from @deads2k

// attached to all token secrets this controller create
CreateDockercfgSecretsController = "openshift.io/create-dockercfg-secrets"

// PendingTokenAnnotation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

e.serviceAccountCache = NewEtcdMutationCache(serviceAccountCache)

tokenSecretSelector := fields.OneTermEqualSelector(api.SecretTypeField, string(api.SecretTypeServiceAccountToken))
e.secretCache, e.secretController = framework.NewInformer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need this to be indexed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secrets for a service account are always in the same namespace. So it seems that you can index by namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, we're looking up by key only (implicit index)

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch from 87e625e to 26e69cf Compare October 24, 2016 16:31
}
dockercfgSecret.Data[api.DockerConfigKey] = dockercfgContent

// Save the secret
createdSecret, err := e.client.Secrets(tokenSecret.Namespace).Create(dockercfgSecret)
if err != nil {
// Clean up the generated token secret if we're not going to use it
glog.V(2).Infof("deleting unused token secret %s/%s, error creating dockercfgSecret: %v", tokenSecret.Namespace, tokenSecret.Name, err)
glog.V(2).Infof("Deleting unused token secret %s/%s, error creating dockercfgSecret: %v", tokenSecret.Namespace, tokenSecret.Name, err)
if deleteErr := e.client.Secrets(tokenSecret.Namespace).Delete(tokenSecret.Name); (deleteErr != nil) && !kapierrors.IsNotFound(deleteErr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions is this still necessary? Since we have the name of the token secret we created, we ought to be able to use it to back the docker pull secret instead of creating another one right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will remove

@deads2k
Copy link
Contributor

deads2k commented Oct 24, 2016

One questions/comment. lgtm otherwise.

@mfojtik mfojtik force-pushed the dockercfg-controller-2 branch from 26e69cf to 6744424 Compare October 25, 2016 07:59
@mfojtik
Copy link
Contributor Author

mfojtik commented Oct 25, 2016

@deads2k @liggitt removed the deletion of secret token [merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 6744424

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 25, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10599/) (Image: devenv-rhel7_5239)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 6744424

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10599/) (Base Commit: 66847bb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

at scale, dockercfg controller can miss token population window
5 participants