-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix build controller performance issues #12623
Conversation
@bparees ptal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming nits. Also we're going to want to run the full extended builds/image_ecosystem suite on this.
BuildCompletedAnnotation = "openshift.io/build.completed" | ||
// BuildAcceptedAnnotation is an annotation used to update a build that has been | ||
// created so it can be seen build controller queue before a resync. | ||
BuildAcceptedAnnotation = "openshift.io/build.accepted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new annotation format(sigh) is
BuildSourceSecretMatchURIAnnotationPrefix = "build.openshift.io/source-secret-match-uri-"
please align these new annotations with that.
// will prevent the build controller from processing it further. | ||
BuildCompletedAnnotation = "openshift.io/build.completed" | ||
// BuildAcceptedAnnotation is an annotation used to update a build that has been | ||
// created so it can be seen build controller queue before a resync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BuildAcceptedAnnotation is an annotation used to update a build that can now be run based on the RunPolicy(e.g. Serial). Updating the build with this annotation forces the build to be processed by the build controller queue without waiting for a resync.
and technically this needs @openshift/api-review |
8d16772
to
e1df6bd
Compare
comments addressed [test] |
e1df6bd
to
28625e6
Compare
I'm a little surprised these are annotations. Why wouldn't we have conditions for these on BuildStatus? |
Um. Why? |
@@ -102,9 +102,13 @@ func (bc *BuildController) HandleBuild(build *buildapi.Build) error { | |||
} | |||
|
|||
if buildutil.IsBuildComplete(build) { | |||
if _, ok := build.Annotations[buildapi.BuildCompletedAnnotation]; ok { | |||
return nil | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you have complete... and really complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this safe? Why aren't you just creating a cache of "really complete" builds in memory and consulting that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) I couldn't think of a better name ... but basically it means that we shouldn't run the runPolicy.OnComplete command on it. Maybe ran-policy-on-complete
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton how is building a cache better ? It really doesn't matter that we run the policy.OnComplete again, but we'd rather avoid it if we can.
Adding post-processing annotations instead of just using the straight status fields we have (you can index on them if you want) is effectively adding multiple (potentially conflicting) sources of truth. Adding them to annotations in particular gets weird quick, since you've created a "status" annotation you could set via a spec update. In addition, you don't even get label selection benefit as an annotation and you aren't using them for restricting selection from the server, so they aren't buying much in terms of performance over checking the actual status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also memory caches die. we need this persisted so we don't have to run through all the builds again on a restart of the controller.
The distinction is a complete build means the build is complete. this annotation means "and we don't care about it anymore" (we still care about complete builds, the first time we see them enter the complete state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k so you're proposing to modify BuildStatus to add this state ("really completed") to it?
can't tell if that's directed at @deads2k or the PR itself. |
If it isn't actually computable from current status, yes. If it is computable from current status, do that and produce your own index (other controllers do this for various resources). Having it as annotation doesn't seem like a good result. |
Yeah, I'm -1 on modifying the object. Your controller should cache the negatives (ignore this because we know it's calculated) and you should bound that check if you're concerned with memory. |
Also, you cannot prevent racing controllers or severely delayed updates, so you will have to run this policy periodically, not just once. |
resync logic will always figure out if a build needs to be run eventually. this logic is related to an optimization path around "whenever a build completes, see if there is another build queued that we should immediately run". |
Speaking in person, this sounds like a build condition that is just missing from status today and the phase,condition tuple controls behavior (which is why we moved to conditions, away from phases). The starttime one is just supposed to be a poke, so a secondary path to kicking the controller logic locally ought to work. |
(and the problem is that we are currently doing that check for every build, every time we resync. we want to only do it for builds that really did just complete) |
28625e6
to
bd0fc2a
Compare
[test] |
i think it lgtm. |
for _, build := range nextBuilds { | ||
build.Status.StartTimestamp = &now | ||
build.Annotations[buildapi.BuildAcceptedAnnotation] = uuid.NewRandom().String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO for this block - "replace with informer notification requeueing in the future"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Calls policy completed build processing as soon as it's marked Complete instead of doing it when the build is handled by the BuildController. Uses a build.openshift.io/accepted annotation to bump a build so the BuildController can see it in its queue before the next Resync.
bd0fc2a
to
f3d26a3
Compare
[testextended][extended:core(builds)] |
Lgtm as well |
Evaluated for origin test up to f3d26a3 |
Evaluated for origin testextended up to f3d26a3 |
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58) |
1 similar comment
continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13216/) (Base Commit: b0f2c58) |
continuous-integration/openshift-jenkins/testextended FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin_extended/1022/) (Base Commit: b0f2c58) (Extended Tests: core(builds)) |
1 extended test failed with a flake |
Evaluated for origin merge up to f3d26a3 |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/13267/) (Base Commit: 2ea8719) (Image: devenv-rhel7_5766) |
Calls policy completed build processing as soon as it's marked Complete
instead of doing it when the build is handled by the BuildController.
Uses a build.openshift.io/accepted annotation to bump a build so the
BuildController can see it in its queue before the next Resync.