Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Encoding for LIST Responses #5116

Open
4 of 14 tasks
serathius opened this issue Jan 31, 2025 · 16 comments
Open
4 of 14 tasks

Streaming Encoding for LIST Responses #5116

serathius opened this issue Jan 31, 2025 · 16 comments
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Milestone

Comments

@serathius
Copy link
Contributor

serathius commented Jan 31, 2025

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 31, 2025
@serathius
Copy link
Contributor Author

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 31, 2025
@serathius serathius changed the title Streaming Response Encoding Streaming Encoding for LIST Responses Jan 31, 2025
@chenk008
Copy link
Contributor

chenk008 commented Feb 7, 2025

I'm glad to see this proposal. We have also implemented similar capabilities in our inner repo and are preparing to push this part to upstream. We have submitted a CFP for the upcoming KubeCon China conference.

In our implementation, we use sync.Pool to efficiently manage memory allocation and cache the serialized results of each item. When the buffer reaches a certain size, we execute a flush operation to parallelize the serialization processing and write to http2.

Additionally, we have added support for gzip compression, which is only enabled when the first batch of cached data reaches 128 * 1024.

For json serialization, we have customized the StreamMarshal method for unstructuredList.

As for protobuf, we generate code through a generator to ensure reverse protobuf marshalling compatibility.

type StreamMarshaller interface {
	// return the object size and the item size slice
	StreamSize() (uint64, []int)

	StreamMarshal(w stream.Writer, itemSize []int) error
}

And it has conducted extensive testing with large datasets and have obtained comparative results. @yulongfang Can you share some benchmark results?

@yulongfang
Copy link

Thank @chenk008 for your introduction. We have many large-scale clusters in Alibaba Cloud. When the controllers of these large-scale clusters are restarted, they will initiate a full list request to the apiserver, which will have a certain impact on the stability of the cluster. We have to use larger machines to run the apiserver, resulting in a waste of resources.

In this context, we adopted the method to carry out relevant optimization and achieved the following results.

list json format return data stress test scenario description:

  • apiserver version: 1.30
  • apiserver specification: 32c 128GB
  • apiserver replica number: 1 replica
  • number of stock resources: build 10,000 100kb cr information
  • stress test scenario: increase pressure according to the gradient of qps 0.1 / 0.5

list json format return data related stress test data:

qps 0.05

  • before optimization: cpu 35.7 c mem 89Gb
  • stream json after optimization: cpu 6.22 c mem 60 Gb

qps 0.1

  • before optimization: cpu 11 c mem 146Gb
  • stream json after optimization: cpu 7.45 c mem 97 Gb

list protobuf Format Returned data Stress test scenario description:

  • apiserver version: 1.30
  • apiserver specification: 32c 128GB
  • apiserver replica number: 1 replica
  • Number of existing resources: Build 10,000 configmaps information of size 100kb
  • Stress test scenario: Increase pressure according to the gradient of qps 0.1 / 0.5

list configmaps format Returned data Related stress test data:

qps 0.05

  • Before optimization: cpu 16.8 c mem 54.3Gb
  • After stream json optimization: cpu 16.8 c mem 16.1 Gb

qps 0.1

  • Before optimization: cpu 42 c mem 122Gb
  • After stream json optimization: cpu 42 c mem 18 Gb

@BenTheElder
Copy link
Member

BenTheElder commented Feb 12, 2025

FYI: Technical details are usually discussed in KEP PRs or elsewhere, with the KEP issue serving as a place to link back work.

@chenk008 @yulongfang you might consider reviewing #5119

@serathius
Copy link
Contributor Author

Hey @chenk008 @yulongfang please see the previous discussion in kubernetes/kubernetes#129304 and kubernetes/kubernetes#129334.
We also have already done a performance analysis of our changes in kubernetes/kubernetes#129304 (comment).

We also added running a automatic benchmark of list requests. You can see the results in https://perf-dash.k8s.io/#/?jobname=benchmark%20list&metriccategoryname=E2E&metricname=Resources&Resource=memory&PodName=kube-apiserver-benchmark-list-master%2Fkube-apiserver

We currently run it in JSON + configmap with RV="" configuration, hope to expand it to include Proto, Pods, CustomResources and other types of LIST request. Would be awesome if you can contribute.

@serathius
Copy link
Contributor Author

/milestone v1.33

@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Feb 13, 2025
@pacoxu
Copy link
Member

pacoxu commented Feb 14, 2025

@jpbetz @dipesh-rawat this is target to v1.33 and the KEP was merged.
Should the lead-opted-in and tracked label be added and tracked by release team?

@dipesh-rawat
Copy link
Member

@serathius @pacoxu Unfortunately, the enhancement freeze deadline has passed, and this KEP issue was not lead-opted-in, so it wasn’t added to the tracking board for the v1.33 release. Post-freeze, we've disabled the automated sync job for KEP issues to the tracking board.

To move forward, we’ll need a short exception request filed so the team can add the lead-opted-in label and manually include this in the tracking board.

If you still wish to progress this enhancement in v1.33, please file an exception request as soon as possible, within three days. If you have any questions, you can reach out in the #release-enhancements channel on Slack and we'll be happy to help. Thanks!

(cc v1.33 Release Lead @npolshakova)

@serathius
Copy link
Contributor Author

serathius commented Feb 14, 2025

Ups, @jpbetz is OOO. @deads2k can you take a look?

@serathius
Copy link
Contributor Author

@dipesh-rawat
Copy link
Member

@serathius Since the release team has APPROVED the exception request here. This will be considered to be added to the milestone for v1.33 release.

@dipesh-rawat
Copy link
Member

Hello @serathius 👋, v1.33 Enhancements team here.

This enhancement is targeting stage beta for v1.33 (correct me, if otherwise)
/stage beta

Here's where this enhancement currently stands:

  • KEP readme using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable for latest-milestone: v1.33. KEPs targeting stable will need to be marked as implemented after code PRs are merged and the feature gates are removed.
  • KEP readme has up-to-date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements. (For more information on the PRR process, check here). If your production readiness review is not completed yet, please make sure to fill the production readiness questionnaire in your KEP by the PRR Freeze deadline on Thursday 6th February 2025 so that the PRR team has enough time to review your KEP.

With all the KEP requirements in place and merged into k/enhancements, this enhancement is all good for the upcoming enhancements freeze. 🚀

Could we please link the KEP README in the issue description.

The status of this enhancement is marked as Tracked for enhancements freeze. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

/label tracked/yes

@k8s-ci-robot k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Feb 14, 2025
@dipesh-rawat dipesh-rawat moved this to Tracked for enhancements freeze in 1.33 Enhancements Tracking Feb 14, 2025
@dipesh-rawat
Copy link
Member

I've manually added this KEP to the tracking board and marked it as tracked for enhancements freeze🚀

Could one of the sig leads add the lead-opted-in label? @deads2k, would you be able to help with this or point me to someone who can? Thanks!

@dipesh-rawat
Copy link
Member

Could one of the sig leads add the lead-opted-in label?

@serathius Would you be able to assist with the above request? It would be great to get the label added as work is being done in this v1.33 release.

@serathius
Copy link
Contributor Author

I'm not a SIG api-machinery lead, so I don't think I should use it. I can ask nicely on Slack.

@deads2k
Copy link
Contributor

deads2k commented Feb 17, 2025

/label lead-opted-in
/milestone v1.33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lead-opted-in Denotes that an issue has been opted in to a release sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Projects
Status: Tracked for enhancements freeze
Development

No branches or pull requests

8 participants