Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy: tweak enqueueing in the trigger controller #11501

Merged
merged 1 commit into from
Oct 25, 2016
Merged

deploy: tweak enqueueing in the trigger controller #11501

merged 1 commit into from
Oct 25, 2016

Conversation

0xmichalis
Copy link
Contributor

// but I think this is fine.
shouldInstantiate := true
if newDc.Status.LatestVersion > 0 {
latestRc, err := c.rcLister.ReplicationControllers(newDc.Namespace).Get(deployutil.LatestDeploymentNameForConfig(newDc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rc lister significantly lags, what happens here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is, we are dropping the dc event. e343373 changed the dc controller to update observedGeneration when a new rc fails to be created. If we change it back and have just the dc condition updated (I think I should have it do that in the first place) then we can safely ignore errors from the rc cache and proceeed with enqueueing since the dc has to be synced at this point. Not sure if there is any case where we shouldn't block instantiate because the dc controller fails to create a new rc.

@smarterclayton
Copy link
Contributor

[test]

@0xmichalis
Copy link
Contributor Author

Updated to ignore errors from the rc cache and defer to instantiate to do the rc check. I opted for letting the dc controller update observedGeneration even when it observes that it cannot create a new rc so that the trigger controller will never be blocked.

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 5398c1a

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10549/) (Base Commit: 9195050)

// rcLister provides a local cache for replication controllers.
rcLister cache.StoreToReplicationControllerLister
// rcListerSynced makes sure the dc store is synced before reconcling any replication controller.
rcListerSynced func() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, needed for the controller to know if its caches are synced

glog.V(2).Infof("Cannot decode dc from replication controller %s: %v", deployutil.LabelForDeployment(latestRc), err)
return
}
shouldInstantiate = !reflect.DeepEqual(newDc.Spec.Template, initial.Spec.Template)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how expensive is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expensive as a deep equal check can be? We were running this in the trigger controller already prior to refactoring the controller to use the instantiate endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep equal is expensive but it's the correct thing to use for now (vs something net new).

if err != nil {
// If we get an error here it may be due to the rc cache lagging behind. In such a case
// just defer to the api server (instantiate REST) where we will retry this.
glog.V(2).Infof("Cannot get latest rc for dc %s:%d (%v) - will defer to instantiate", deployutil.LabelForDeploymentConfig(newDc), newDc.Status.LatestVersion, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V(4)

} else {
initial, err := deployutil.DecodeDeploymentConfig(latestRc, c.codec)
if err != nil {
glog.V(2).Infof("Cannot decode dc from replication controller %s: %v", deployutil.LabelForDeployment(latestRc), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to reconcile here? if we failed to decode, what are the chances that we will succeed next time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't reconcile, I just return without adding the dc in the queue

@mfojtik
Copy link
Contributor

mfojtik commented Oct 25, 2016

LGTM [merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 5398c1a

@smarterclayton
Copy link
Contributor

I guess it depends on what observedGeneration is supposed to mean. Right
now it means "I've seen your request and I'm trying to bring it to
completion" but not "I've succeeded at reaching your request".

I don't think you can treat that the rc cache has anything to do with the
dc cache in this case though - the rc cache can wedge at any time and you
can be arbitrarily far behind the real state

On Sat, Oct 22, 2016 at 4:21 PM, Michail Kargakis [email protected]
wrote:

@Kargakis commented on this pull request.

In pkg/deploy/controller/generictrigger/factory.go
#11501:

if len(newDc.Spec.Triggers) == 0 || newDc.Spec.Paused {
  return

}

  • // We don't want to compete with the main deployment config controller. Let's process this
  • // config once it's synced. Note that this does not eliminate conflicts between the two
  • // controllers because the main controller is constantly updating deployment configs as
  • // owning replication controllers and pods are updated.
  • if !deployutil.HasSynced(newDc, newDc.Generation) {
  • return
    
  • }
  • // Compare deployment config templates before enqueueing. This reduces the amount of times
  • // we will try to instantiate a deployment config to the exact number of times we need at
  • // the expense of duplicating some of the work that the instantiate endpoint is already doing
  • // but I think this is fine.
  • shouldInstantiate := true
  • if newDc.Status.LatestVersion > 0 {
  • latestRc, err := c.rcLister.ReplicationControllers(newDc.Namespace).Get(deployutil.LatestDeploymentNameForConfig(newDc))
    

As is, we are dropping the dc event. e343373
e343373
changed the dc controller to update observedGeneration when a new rc fails
to be created. If we change it back and have just the dc condition updated
(I think I should have it do that in the first place) then we can safely
ignore errors from the rc cache and proceeed with enqueueing since the dc
has to be synced at this point. Not sure if there is any case where we
shouldn't block instantiate because the dc controller fails to create a new
rc.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11501, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p-QBFvrX6_0B6SFnSJ-cB8HNJWU_ks5q2nBEgaJpZM4Kd2GJ
.

@0xmichalis
Copy link
Contributor Author

I guess it depends on what observedGeneration is supposed to mean. Right
now it means "I've seen your request and I'm trying to bring it to
completion" but not "I've succeeded at reaching your request".

I don't think you can treat that the rc cache has anything to do with the
dc cache in this case though - the rc cache can wedge at any time and you
can be arbitrarily far behind the real state

The rc cache has nothing to do with the dc cache, that's right. It will also be much heavier but my point with expecting a synced dc is that we have a better chance to not conflict with the dc controller.

@openshift-bot
Copy link
Contributor

openshift-bot commented Oct 25, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/10549/) (Base Commit: 8b8e813) (Image: devenv-rhel7_5240)

@openshift-bot openshift-bot merged commit a76ec57 into openshift:master Oct 25, 2016
@0xmichalis 0xmichalis deleted the tweak-trigger-controller-enqueueing branch October 26, 2016 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants