-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrying when attempting to stream build logs #19695
Retrying when attempting to stream build logs #19695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth consideration - a configurable timeout if --wait
or --follow
are set to true
pkg/oc/cli/cmd/startbuild.go
Outdated
var ( | ||
streamLogWaitDuration = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used var
here so I could run unit tests.
pkg/oc/cli/cmd/startbuild.go
Outdated
for i := 0; i < streamLogAttempts; i++ { | ||
rd, logErr := o.BuildLogClient.Logs(build.Name, opts).Stream() | ||
if logErr != nil { | ||
err = ocerrors.NewError("error getting logs").WithCause(logErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unclear why, but the ocerrors.IsTimeoutErr
check always returned false, so I removed it.
pkg/oc/cli/cmd/startbuild.go
Outdated
break | ||
err = o.streamBuildLogs(newBuild) | ||
if err != nil { | ||
fmt.Fprintf(o.ErrOut, "Failed to stream the build logs - to view the logs, run oc logs build/%s after the build completes.\n", newBuild.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add help text with work-around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't have to be after the build completes, as long as the build is in a running state we should be able to get logs from it (and optionally follow the log stream)
pkg/oc/cli/cmd/startbuild.go
Outdated
if logErr != nil { | ||
err = ocerrors.NewError("error getting logs").WithCause(logErr) | ||
// retry after waiting a fixed period of time | ||
time.Sleep(streamLogWaitDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we want to wait here. the log api already waits 10s for the build to start running(right?), so i'd say just call it again immediately and let it do the waiting.
here's the behavior i think we want:
wdyt? |
reply:
|
only if some level of debug is enabled. |
@bparees updates from my investigation:
|
no i don't think so. let's just bump the existing 10s timeout to 30s. Between that and adding the retry handling i think we'll cover 99% of use cases. |
pkg/build/registry/buildlog/rest.go
Outdated
@@ -38,7 +38,7 @@ type REST struct { | |||
getSimpleLogsFn func(podNamespace, podName string, logOpts *kapi.PodLogOptions) (runtime.Object, error) | |||
} | |||
|
|||
const defaultTimeout time.Duration = 10 * time.Second | |||
const defaultTimeout time.Duration = 60 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's godoc what this timeout is (it is how long we wait for the build to be running before giving up on getting logs for a build)
pkg/oc/cli/cmd/startbuild.go
Outdated
break | ||
err = o.streamBuildLogs(newBuild) | ||
if err != nil { | ||
fmt.Fprintf(o.ErrOut, "Failed to stream the build logs - to view the logs, run oc logs build/%s\n", newBuild.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should print the error too
@openshift/cli-review ptal |
/retest |
* Add retry when attempting to stream build logs. * Increase server-side build wait timeout to 30s. Fixes bug 1575990
f944ca6
to
7a1bf39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
One question, but overall lgtm. I'll leave final approval for @bparees
break | ||
err = o.streamBuildLogs(newBuild) | ||
if err != nil { | ||
fmt.Fprintf(o.ErrOut, "Failed to stream the build logs - to view the logs, run oc logs build/%s\nError: %v\n", newBuild.Name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next if
just struck me, why we check o.Follow
in the next if
block if it's meant only for waiting for build completion? Following should be distinct from waiting from completion, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah good point, i think we should remove the check for o.Follow below, such that if the user only specified "--follow" and we fail to get the logs (due to timeout) the command simply exits.
that said, it's a change in behavior from the current behavior which is that even if you only specify "--follow" the command will still wait for the build to complete, even if it fails to get the logs.
But i would argue the existing behavior is a bug. If you want the command to wait, you pass --wait. If you pass --follow, you don't get wait semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too agree that the existing behavior is a bug (if follow fails, the command hangs), and this will now have higher user impact because the follow
error message includes the work-around instruction. I'll remove the o.Follow
check.
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adambkaplan, bparees, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
@adambkaplan: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Fixes bug 1575990