-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apps: record cause of rollout and deployer pods timestamps back to rc #16347
apps: record cause of rollout and deployer pods timestamps back to rc #16347
Conversation
not sure if time.Now() is the best format or we want just report UNIX timestamps (actually I might be more happy with UNIX timestamps as you can do math with them without conversion) |
Generally we should be encoding times as RFC3339 whenever they go into apis
On Sep 14, 2017, at 8:39 AM, Michal Fojtik <[email protected]> wrote:
not sure if time.Now() is the best format or we want just report UNIX
timestamps (actually I might be more happy with UNIX timestamps as you can
do math with them without conversion)
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#16347 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p3f2tIp7xgrO-4hh05GXx8b2SWgUks5siR54gaJpZM4PXg9Z>
.
|
pkg/apps/apis/apps/types.go
Outdated
DeployerPodCreatedAtAnnotation = "openshift.io/deployer-pod.createdAt" | ||
// DeployerPodFinishedAnnotation is an annotation on deployment that records | ||
// the time of when the deploymer pod finished. | ||
DeployerPodCompletedAtAnnotation = "openshift.io/deployer-pod.completedAt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed-at maybe. We're kind of inconsistent but no reason not to use dash
@smarterclayton if we encode RFC3339 to string, we need to decode, why we hate ourself? |
6f46efa
to
0268ee6
Compare
@smarterclayton fixed. ptal |
pkg/apps/apis/apps/types.go
Outdated
// records the time of when the deployer pod for this particular | ||
// deployment was created. | ||
DeployerPodCreatedAtAnnotation = "openshift.io/deployer-pod.created-at" | ||
// DeployerPodFinishedAnnotation is an annotation on deployment that records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/DeployerPodFinishedAnnotation/DeployerPodCompletedAtAnnotation
pkg/apps/apis/apps/types.go
Outdated
@@ -37,6 +37,13 @@ const ( | |||
// annotation value is the name of the deployer Pod which will act upon the ReplicationController | |||
// to implement the deployment behavior. | |||
DeploymentPodAnnotation = "openshift.io/deployer-pod.name" | |||
// DeployerPodCreatedAnnotation is an annotation on a deployment that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/DeployerPodCreatedAnnotation/DeployerPodCreatedAtAnnotation
pkg/apps/apis/apps/types.go
Outdated
@@ -37,6 +37,13 @@ const ( | |||
// annotation value is the name of the deployer Pod which will act upon the ReplicationController | |||
// to implement the deployment behavior. | |||
DeploymentPodAnnotation = "openshift.io/deployer-pod.name" | |||
// DeployerPodCreatedAnnotation is an annotation on a deployment that | |||
// records the time of when the deployer pod for this particular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mention the time format here or is RFC3339 implied in our API in general?
@@ -149,6 +150,7 @@ func (c *DeploymentController) handle(deployment *v1.ReplicationController, will | |||
return actionableError(fmt.Sprintf("couldn't create deployer pod for %q: %v", deployutil.LabelForDeploymentV1(deployment), err)) | |||
} | |||
updatedAnnotations[deployapi.DeploymentPodAnnotation] = deploymentPod.Name | |||
updatedAnnotations[deployapi.DeployerPodCreatedAtAnnotation] = time.Now().Format(time.RFC3339) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will get thrown away in case the update at the end fails on 409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not lost, on next reconcile we set this back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not if the deployer pod is missing (e.g. deleted manually) I think - you set it back when the pod is present only on L182
@@ -294,10 +297,12 @@ func (c *DeploymentController) nextStatus(pod *v1.Pod, deployment *v1.Replicatio | |||
// Sync the internal replica annotation with the target so that we can | |||
// distinguish deployer updates from other scaling events. | |||
updatedAnnotations[deployapi.DeploymentReplicasAnnotation] = updatedAnnotations[deployapi.DesiredReplicasAnnotation] | |||
updatedAnnotations[deployapi.DeployerPodCompletedAtAnnotation] = time.Now().Format(time.RFC3339) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thrown away at 409, and since the pod will be deleted it won't get reconciled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be reconciled until we succeed updating the RC status, so eventually this will be set... there might be time skew if the reconciliation takes a long time, but the over deployment status won't be updated as well, so from the user PoC the deployment will seems like still progressing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all edge driven and won't get reconciled. there will be states when one or both of those annotations are missing causing metrics weirdness and bugs to be filled
0268ee6
to
43a2ba3
Compare
so the created state will be partially reconciled but overwritten at the same time at L181. (But I don't think this will reconcile a state when someone deletes the pod manually.) The deletion won't be reconciled I think. But the bigger issue arising seems to be the time that due to reconciliation might be off because you are using time.Now(). In the reconciliation cycle that might be even added after rate-limiting that might be significantly off. One thing is that we might choose to not to write that time rather than be off and providing misleading data. The second is that we could use the real times in the informer we use to feed the deployer controller with pod events and use real times from pod filled in by kubelet. Yes, on a relist the informer might loose probably the delete event and we don't get the deleting timestamp but we could reconcile it in deployer controller and set value for it as it should be fairly rare and this way it won't create false positives in metrics. |
43a2ba3
to
e56fd0a
Compare
I guess we should have 3 values to fully reason about it:
|
/retest |
@tnozicka having three timestamps seems unreasonable to me as we basically just want to measure how long the rollout takes, which is easy with 2 timestamps ;-) I don't think the time between the pod was created and container started matters that much. |
@mfojtik in that case you need to use startedAt, not creation timestamp |
i switched this to use creationtimestamp of the pod as created-at annotation and finishedTimestamp of the container that runs the deployer as completed-at. So there is no time.Now() anymore. I really hate to have special routine in deployer controller that will be updating RC in parallel with the main routine that handle the rollout. That is basically asking for troubles. |
@tnozicka the deployment process will fail when the container cannot be created right? in that case that RC will basically end up with no timestamps. |
e56fd0a
to
ac494ed
Compare
Mfojtik you know that metav1.Time automatically handles RFC3339 for
you right? And that's how we serialize all json times?
|
@smarterclayton yeah, i'm using it already ;-) |
/retest |
@tnozicka we are green, but I think i'm going to add some tests for this tomorrow |
@smarterclayton @tnozicka is out this week, can you take a second look? |
pkg/apps/apis/apps/types.go
Outdated
// deployment was started. | ||
DeployerPodStartedAtAnnotation = "openshift.io/deployer-pod.started-at" | ||
// DeployerPodCompletedAtAnnotation is an annotation on deployment that records | ||
// the time in RFC3339 format of when the deploymer pod finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/deploymer/deployer/
@@ -149,6 +149,10 @@ func (c *DeploymentController) handle(deployment *v1.ReplicationController, will | |||
return actionableError(fmt.Sprintf("couldn't create deployer pod for %q: %v", deployutil.LabelForDeploymentV1(deployment), err)) | |||
} | |||
updatedAnnotations[deployapi.DeploymentPodAnnotation] = deploymentPod.Name | |||
updatedAnnotations[deployapi.DeployerPodCreatedAtAnnotation] = deploymentPod.CreationTimestamp.String() | |||
if deploymentPod.Status.StartTime != nil { | |||
updatedAnnotations[deployapi.DeployerPodCreatedAtAnnotation] = deploymentPod.Status.StartTime.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotation should be DeployerPodStartedAtAnnotation
not DeployerPodCreatedAtAnnotation
@@ -176,6 +180,10 @@ func (c *DeploymentController) handle(deployment *v1.ReplicationController, will | |||
} else { | |||
// Update to pending or to the appropriate status relative to the existing validated deployer pod. | |||
updatedAnnotations[deployapi.DeploymentPodAnnotation] = deployer.Name | |||
updatedAnnotations[deployapi.DeployerPodCreatedAtAnnotation] = deployer.CreationTimestamp.String() | |||
if deployer.Status.StartTime != nil { | |||
updatedAnnotations[deployapi.DeployerPodCreatedAtAnnotation] = deployer.Status.StartTime.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments, lgtm after they are addressed
Also let's state that those annotations are filled at best (or at least reasonable) effort and might not be present (or be incomplete) at rare cases which should not be considered bugs. |
ac494ed
to
31c51ab
Compare
@tnozicka comments addressed, thanks! |
@@ -149,6 +149,10 @@ func (c *DeploymentController) handle(deployment *v1.ReplicationController, will | |||
return actionableError(fmt.Sprintf("couldn't create deployer pod for %q: %v", deployutil.LabelForDeploymentV1(deployment), err)) | |||
} | |||
updatedAnnotations[deployapi.DeploymentPodAnnotation] = deploymentPod.Name | |||
updatedAnnotations[deployapi.DeployerPodStartedAtAnnotation] = deploymentPod.CreationTimestamp.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should be DeployerPodCreatedAtAnnotation, it was ok before
@@ -176,6 +180,10 @@ func (c *DeploymentController) handle(deployment *v1.ReplicationController, will | |||
} else { | |||
// Update to pending or to the appropriate status relative to the existing validated deployer pod. | |||
updatedAnnotations[deployapi.DeploymentPodAnnotation] = deployer.Name | |||
updatedAnnotations[deployapi.DeployerPodStartedAtAnnotation] = deployer.CreationTimestamp.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one should be DeployerPodCreatedAtAnnotation, it was ok before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brain fart, sorry. fixed.
31c51ab
to
fb1679e
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mfojtik, tnozicka The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
/kind bug not having this in 3.7 means we can't build up better metrics in 3.8 |
/test all [submit-queue is verifying that this PR is safe to merge] |
@mfojtik: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue (batch tested with PRs 16347, 16652). |
This helps provide better metrics for deployment configs where we can report how much time the rollout took so in theory we can record times for last failed rollouts.
@Kargakis @tnozicka relatively simple change.