Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: update watcher API as per gRFC A88 #7977

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

purnesh42H
Copy link
Contributor

@purnesh42H purnesh42H commented Jan 2, 2025

This is the first part of implementing gRFC A88 (grpc/proposal#466).

This introduces the new watcher API but does not change any of the existing behavior. This table summarizes the API changes and behavior for each case:

Case Old API New API Behavior
Resource timer fires OnResourceDoesNotExist() OnResourceChanged(NOT_FOUND) Fail data plane RPCs
LDS or CDS resource deletion OnResourceDoesNotExist() OnResourceChanged(NOT_FOUND) Drop resource and fail data plane RPCs
xDS channel reports TRANSIENT_FAILURE OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
ADS stream terminates without receiving a response OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
Invalid resource update (client NACK) OnError() OnResourceChanged(status) if resource NOT already cached; OnAmbientError(status) otherwise Continue using cached resource, if any; otherwise, fail data plane RPCs
Valid resource update OnUpdate(resource) OnResourceChanged(resource) use the new resource

RELEASE NOTES:

  • xds: TBD

@purnesh42H purnesh42H added Type: Feature New features or improvements in behavior Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Jan 2, 2025
@purnesh42H purnesh42H added this to the 1.70 Release milestone Jan 2, 2025
@purnesh42H purnesh42H self-assigned this Jan 2, 2025
@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 7 times, most recently from 8d198b7 to a9b45c0 Compare January 3, 2025 16:28
@purnesh42H purnesh42H requested a review from markdroth January 3, 2025 16:58
@purnesh42H purnesh42H assigned dfawley and markdroth and unassigned purnesh42H Jan 3, 2025
@purnesh42H purnesh42H requested a review from dfawley January 3, 2025 16:59
@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 2 times, most recently from 4009e3e to 57dbf23 Compare January 6, 2025 12:03
Copy link

codecov bot commented Jan 6, 2025

Codecov Report

Attention: Patch coverage is 81.36646% with 30 lines in your changes missing coverage. Please review.

Project coverage is 82.45%. Comparing base (724f450) to head (b9d2a92).
Report is 44 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/testutils/resource_watcher.go 20.00% 11 Missing and 1 partial ⚠️
xds/internal/server/listener_wrapper.go 42.10% 9 Missing and 2 partials ⚠️
.../balancer/clusterresolver/resource_resolver_eds.go 78.57% 2 Missing and 1 partial ⚠️
xds/internal/resolver/xds_resolver.go 66.66% 2 Missing ⚠️
xds/internal/xdsclient/clientimpl_watchers.go 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7977      +/-   ##
==========================================
+ Coverage   82.28%   82.45%   +0.17%     
==========================================
  Files         381      388       +7     
  Lines       38539    39048     +509     
==========================================
+ Hits        31712    32198     +486     
+ Misses       5535     5530       -5     
- Partials     1292     1320      +28     
Files with missing lines Coverage Δ
xds/internal/balancer/cdsbalancer/cdsbalancer.go 81.69% <100.00%> (-0.07%) ⬇️
...s/internal/balancer/cdsbalancer/cluster_watcher.go 100.00% <100.00%> (ø)
xds/internal/resolver/watch_service.go 100.00% <100.00%> (+8.57%) ⬆️
xds/internal/server/rds_handler.go 90.69% <100.00%> (-0.61%) ⬇️
xds/internal/xdsclient/authority.go 77.21% <100.00%> (+0.34%) ⬆️
...nal/xdsclient/xdsresource/cluster_resource_type.go 78.37% <100.00%> (+1.23%) ⬆️
...l/xdsclient/xdsresource/endpoints_resource_type.go 76.47% <100.00%> (+1.47%) ⬆️
...al/xdsclient/xdsresource/listener_resource_type.go 86.44% <100.00%> (+0.47%) ⬆️
...ds/internal/xdsclient/xdsresource/resource_type.go 100.00% <ø> (ø)
...dsclient/xdsresource/route_config_resource_type.go 76.47% <100.00%> (+1.47%) ⬆️
... and 5 more

... and 58 files with indirect coverage changes

@purnesh42H purnesh42H force-pushed the a88-watcher-api branch 4 times, most recently from 43c9adb to 89f475a Compare January 6, 2025 12:26
@easwars easwars assigned purnesh42H and unassigned easwars, markdroth and dfawley Jan 10, 2025
@easwars easwars assigned purnesh42H and unassigned easwars Jan 30, 2025
@purnesh42H purnesh42H requested a review from easwars January 31, 2025 11:15
@purnesh42H purnesh42H assigned easwars and unassigned purnesh42H Jan 31, 2025
@purnesh42H
Copy link
Contributor Author

@easwars ptal

@purnesh42H purnesh42H requested a review from easwars February 5, 2025 08:50
@@ -194,7 +194,7 @@ func (a *authority) handleADSStreamFailure(serverConfig *bootstrap.ServerConfig,
for watcher := range state.watchers {
watcher := watcher
a.watcherCallbackSerializer.TrySchedule(func(context.Context) {
watcher.OnError(xdsresource.NewErrorf(xdsresource.ErrorTypeConnection, "xds: error received from xDS stream: %v", err), func() {})
watcher.OnAmbientError(xdsresource.NewErrorf(xdsresource.ErrorTypeConnection, "xds: error received from xDS stream: %v", err), func() {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description, for the following two cases:

  • xDS channel reports TRANSIENT_FAILURE |
  • ADS stream terminates without receiving a response
    the old behavior was to call OnError(), while the new behavior is to call OnResourceChanged(status) if resource NOT already cached, and OnAmbientError(status) otherwise.

But, we are only calling the latter here. Am I missing something?

Copy link
Contributor Author

@purnesh42H purnesh42H Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so, since we have already checked for ErrTypeStreamFailedAfterRecv above, the only case left is un cached which means this should always be OnResourceChanged()? or we should look up the cache and call accordingly?

@easwars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ErrTypeStreamFailedAfterRecv just means that some message was received on the stream before it broke. It does not say anything specific to the resource under question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see. I thought ErrTypeStreamFailedAfterRecv meant to state that stream failure happened but we have received the resource once before.

@easwars
Copy link
Contributor

easwars commented Feb 6, 2025

This came up during some other discussion with @dfawley.

  1. Callback methods named OnXxx are more of a C++ style. In Go, we would probably just name them Xxx.
  2. Whether we should have two callback methods, one named ResourceChanged(ResourceData), and the other named ResourceError(error). This is to work around the fact that we don't have a StatusOr type in Go, and without that, having two APIs is better than one API (where we need to add a new struct to hold one of two possible values, with no guarantees that only one of the two values is set).

@easwars easwars assigned purnesh42H and unassigned easwars Feb 6, 2025
@purnesh42H
Copy link
Contributor Author

purnesh42H commented Feb 18, 2025

This came up during some other discussion with @dfawley.

  1. Callback methods named OnXxx are more of a C++ style. In Go, we would probably just name them Xxx.
  2. Whether we should have two callback methods, one named ResourceChanged(ResourceData), and the other named ResourceError(error). This is to work around the fact that we don't have a StatusOr type in Go, and without that, having two APIs is better than one API (where we need to add a new struct to hold one of two possible values, with no guarantees that only one of the two values is set).

I will wait for #8042 before making further changes to this PR since that has the discussion on decision

@arjan-bal arjan-bal modified the milestones: 1.71 Release, 1.72 Release Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants