Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: Send RST stream from the server when deadline expires #8071

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

arjan-bal
Copy link
Contributor

Fixes: #2886

This PR makes the server cancel an RPC when it sees the deadline expiring instead of waiting forn the client to cancel the stream.

RELEASE NOTES:

  • transport: Servers send an HTTP/2 RST_STREAM frame to cancel a stream when the deadline expires.

@arjan-bal arjan-bal added the Type: Behavior Change Behavior changes not categorized as bugs label Feb 10, 2025
@arjan-bal arjan-bal added this to the 1.71 Release milestone Feb 10, 2025
Copy link

codecov bot commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 97.05882% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.49%. Comparing base (cbb5c2f) to head (a45d43b).
Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
internal/leakcheck/leakcheck.go 96.20% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8071      +/-   ##
==========================================
+ Coverage   82.36%   82.49%   +0.13%     
==========================================
  Files         387      388       +1     
  Lines       39065    39049      -16     
==========================================
+ Hits        32174    32214      +40     
+ Misses       5562     5538      -24     
+ Partials     1329     1297      -32     
Files with missing lines Coverage Δ
internal/grpctest/grpctest.go 78.43% <100.00%> (+1.83%) ⬆️
internal/internal.go 100.00% <ø> (ø)
internal/transport/client_stream.go 100.00% <ø> (ø)
internal/transport/http2_server.go 91.63% <100.00%> (+0.40%) ⬆️
internal/transport/server_stream.go 95.31% <ø> (-4.69%) ⬇️
internal/leakcheck/leakcheck.go 89.34% <96.20%> (+4.55%) ⬆️

... and 37 files with indirect coverage changes

@arjan-bal arjan-bal force-pushed the server-side-rst-stream branch from 805939c to 6dda22e Compare February 10, 2025 04:55
@arjan-bal arjan-bal force-pushed the server-side-rst-stream branch 2 times, most recently from 806c418 to fca2350 Compare February 11, 2025 11:21
}

// Teardown performs a leak check.
func (Tester) Teardown(t *testing.T) {
leakcheck.CheckTrackingBufferPool()
leakcheck.CheckTimers(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is additive with the 10s for checking goroutines. It might be nice to pass a context (or timer) to both of these instead, so they can share the 10s timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, changed.


originalTimeAfterFunc func(time.Duration, func()) internal.Timer
mu sync.Mutex
bufferCount int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, removed.

type timerFactory struct {
logger Logger

originalTimeAfterFunc func(time.Duration, func()) internal.Timer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need for this? As opposed to just directly calling time.AfterFunc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about a future scenarios when subtests and outer tests both start leak checks. Having the previously set function would allow wrapping the outer tests function. The global globalTimerTracker would still prevent this from working.

Removed this field. This use case can be addressed if/when it arises.

// not all timers were cancelled or executed. It is invalid to invoke this
// function without previously having invoked SetTimerTracker.
func CheckTimers(timeout time.Duration) {
// Reset the internal function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moved the comment to the end.

// of this cancellation. Alter the status code accordingly.
// of this cancellation. Alter the status code accordingly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed like a harmless typo fix, but I’ve reverted it as requested.

// SetTimerTracker replaces internal.TimerAfterFunc with a one that tracks
// timer creations. CheckTimers should then be invoked at the end of the test to
// validate that all timers created have either executed or are cancelled.
func SetTimerTracker(logger Logger) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't notice that there were tests for this file. Added a test now.

Comment on lines 607 to 609
s.deadlineTimerCancel = func() {
timer.Stop()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.deadlineTimerCancel = func() {
timer.Stop()
}
s.deadlineTimerCancel = timer.Stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the cancel field on ServerStream. cancel is always set to a non-nil value, so this issue is solved.

Comment on lines 1281 to 1283
if s.deadlineTimerCancel != nil {
s.deadlineTimerCancel()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make sure this is always a valid function so we can just call it without the nil check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the existing cancel field.

Comment on lines +1344 to +1345
oldState := s.swapState(streamDone)
if oldState == streamDone {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a bug before? Or did we have something that prevented it? Or was the code tolerant of multiple calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through the references, it seems like closeStream can be called concurrently. The race seems rare because the call to deleteStream below removes the stream from the activeStreams map of the server. Most code paths try to get the stream from activeStreams, so they can't call closeStream anymore.

When the function does execute multiple times, the only symptoms seem to be overcounting of channelz metrics here

if channelz.IsOn() {
if eosReceived {
t.channelz.SocketMetrics.StreamsSucceeded.Add(1)
} else {
t.channelz.SocketMetrics.StreamsFailed.Add(1)
}
}

and possibly sending multiple RST streams to the peer because of enqueuing a cleanupStream event in the controlBuf.

Comment on lines 39 to 40
cancel context.CancelFunc // invoked at the end of stream to cancel ctx.
deadlineTimerCancel func() // Invoked at the end of stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always call cancel? Can we piggyback the timer cancellation with it if so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do, changed to use cancel.

@dfawley dfawley assigned arjan-bal and unassigned dfawley Feb 14, 2025
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Feb 18, 2025
@arjan-bal arjan-bal modified the milestones: 1.71 Release, 1.72 Release Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Behavior Change Behavior changes not categorized as bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server should send RST_STREAM when deadline is exceeded
2 participants