-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make A/B deployment proportional to service weight #15309
Conversation
@knobunc @rajatchopra PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally on the right track. Let's use getServiceUnits to make it a hair simpler.
api/swagger-spec/oapi-v1.json
Outdated
@@ -24586,10 +24586,6 @@ | |||
"$ref": "v1.StageInfo" | |||
}, | |||
"description": "stages contains details about each stage that occurs during the build including start time, duration (in milliseconds), and the steps that occured within each stage." | |||
}, | |||
"logSnippet": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have gone away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. (accidentally deleted)
16ac7bd
to
6f9c3d8
Compare
@@ -812,6 +812,20 @@ func (r *templateRouter) RemoveRoute(route *routeapi.Route) { | |||
r.stateChanged = true | |||
} | |||
|
|||
// numberOfEndpoints returns the number of endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numberOfEndpoints returns the number of endpoints for the given service id
pkg/router/template/router.go
Outdated
r.lock.Lock() | ||
defer r.lock.Unlock() | ||
|
||
var eps int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eps := 0
(0 is a type int, and that it as least 32 bits)
Or if you must:
var eps int32 = 0
pkg/router/template/router.go
Outdated
var eps int32 | ||
eps = 0 | ||
svc, ok := r.findMatchingServiceUnit(id) | ||
if ok && len(svc.EndpointTable) > int(eps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to cast if you use := 0
pkg/router/template/router.go
Outdated
eps = 0 | ||
svc, ok := r.findMatchingServiceUnit(id) | ||
if ok && len(svc.EndpointTable) > int(eps) { | ||
eps = int32(len(svc.EndpointTable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and no need to cast here either...
@knobunc Is there more need? PTAL |
pkg/router/template/router.go
Outdated
// serviceUnits[key] is the weigth for each endpoint in the service | ||
// the sum of the weights of the endpoints is the scaled service weight | ||
|
||
var numEp int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numEp := r.numberOfEndpoints(key)
if numEp < 1 {
numEp = 1
}
@knobunc @rajatchopra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@openshift/networking PTAL |
[test][testextended][extended: networking] |
pkg/router/template/router.go
Outdated
} | ||
// scale the weights to near the maximum (256) | ||
// This improves precision when scaling for the endpoints | ||
var scaleWeight int32 = 256 / maxWeight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This normalization is a good idea. But it will not cover the case where maxWeight is > 128 and one of the services have a weight which is too low compared to its number of endpoints.
e.g. I want A/B testing to happen for a percentage of 1.5% i.e. 1.5% of traffic should be served by serviceA and the rest by serviceB. So I put a weight of 3 on svc A and 200 on svc B.
All is good until I have 20 endpoints for svcA at which point we will give the endpoints a weight of '0'.
Ideally, we should give 3 endpoints a weight of '1' and the rest become '0'(unfortunately).
I have another algorithm in mind (not perfect either). Need discussion.
//Foreach svc:
equitableWeight = trunc(svcWeight/numEp)
for i = 0; i < len(endpoints); i++ {
endpoints[i].weight = equitableWeight
}
for i=0; i < (svcWeight - equitableWeight); i++ {
endpoints[i].weight += 1
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajatchopra Thanks for taking a look. The idea of normalizing was to bring up the largest weight to a 3 digit number. In this 128 is OK. It just gives a little mote headroom.
For your 1.5% example, The weight, if not 0, should have a min of 1. Thanks for spotting this. I'll fix it.
With a min of 1, the number of pods can be reduced if the overall percent is still too high. The 20 pods could be reduced to maybe 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajatchopra I do not like setting the weight to 0 for a live backend. We need to doc clearly that you can send more traffic than the proportion would allow if there are too many backends.
[Extended Tests: networking] |
Failed with:
|
@openshift/networking PTAL |
@@ -66,7 +66,7 @@ func ValidateRoute(route *routeapi.Route) field.ErrorList { | |||
|
|||
backendPath := specPath.Child("alternateBackends") | |||
if len(route.Spec.AlternateBackends) > 3 { | |||
result = append(result, field.Required(backendPath, "cannot specify more than 3 additional backends")) | |||
result = append(result, field.Required(backendPath, "cannot specify more than 3 alternate backends")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiniest nit: Extra space after "alternate"
[test] |
@pecameron testing failed with:
|
Evaluated for origin test up to 073c4b8 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/3457/) (Base Commit: 49df2d6) (PR Branch Commit: 073c4b8) |
A route can front up to 4 services that handle the requests. The load balancing strategy governs which endpoint gets each request. When roundrobin is chosen, the portion of the requests that each service handles is governed by the weight assigned to the service. Each endpoint in the service gets a fraction of the service's requests bug 1470350 https://bugzilla.redhat.com/show_bug.cgi?id=1470350 Code change is in origin PR 15309 openshift/origin#15309
/retest |
/retest |
/retest |
Distribute requests among the route's services based on service weight. The portion of the total weight that each service has is distributed evenly among the service's endpoints. bug 1470350 https://bugzilla.redhat.com/show_bug.cgi?id=1470350
@knobunc PTAL |
@pecameron: The following test failed, say
Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
@rajatchopra PTAL I need a /lgtm to move this forward. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Let's do a follow on PR about the algorithm though. I would have liked that in the case where number of endpoints overrun the maximum manageable with weight '1', we should ignore the rest of the endpoints rather than just 'warn' that A/B weightage will not be honoured.
/approve addresses issue #11881 |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eparis, pecameron, rajatchopra Associated issue requirement bypassed by: eparis The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
Automatic merge from submit-queue (batch tested with PRs 15533, 15414, 15584, 15529, 15309) |
Distribute requests among the route's services based on service weight.
The portion of the total weight that each service has is distributed
evenly among the service's endpoints.
bug 1470350
https://bugzilla.redhat.com/show_bug.cgi?id=1470350