This is based on public documentations, please open an issue if you would like to be added or removed the list.
AWS:
- Amazon EKS supports to run superpod with LeaderWorkerSet to server large LLMs, see blog here.
- A Terraform based EKS Blueprints pattern can be found here. This pattern demonstrates an Amazon EKS Cluster with an EFA-enabled nodegroup that support multi-node inference using vLLM and LeaderWorkerSet.
DaoCloud: LeaderWorkerSet is the default deployment method to run large models crossing multiple nodes on Kubernetes.
Google Cloud:
- GKE leverages LeaderWorkerSet to deploy and serve multi-host gen AI large open models, see blog here.
- A guide to serve DeepSeek-R1 671B or Llama 3.1 405B on GKE, see guide here
Nvidia: LeaderWorkerSet deployments are the recommended method for deploying Multi-Node models with NIM, see document here.
Feel free to submit a PR if you use LeaderWorkerSet in your project and want to be added here.
llmaz: llmaz, serving as an easy to use and advanced inference platform, uses LeaderWorkerSet as the underlying workload to support both single-host and multi-host inference scenarios.
vLLM: vLLM is a fast and easy-to-use library for LLM inference, it can be deployed with LWS on Kubernetes for distributed model serving, see documentation here.
- KubeCon NA 2024: Distributed Multi-Node Model Inference Using the LeaderWorkerSet API by @ahg-g @liurupeng