Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RestartPolicy doesn't make sense for static pods #130288

Open
tallclair opened this issue Feb 19, 2025 · 6 comments · May be fixed by #130334
Open

RestartPolicy doesn't make sense for static pods #130288

tallclair opened this issue Feb 19, 2025 · 6 comments · May be fixed by #130334
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tallclair
Copy link
Member

/kind bug

Static pods should only ever have a restart policy of always. Anything else doesn't make sense, since the Kubelet doesn't track the pod status in a persistent way.

I don't think we can fail validation for backwards-compatibility, but maybe we can just unconditionally overwrite the restart policy when static pods are parsed.

/sig node

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Feb 19, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 19, 2025
@tallclair
Copy link
Member Author

/priority important-longterm
/triage accepted
/cc @yujuhong @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 19, 2025
@tallclair
Copy link
Member Author

Old discussion: #34003

@yujuhong
Copy link
Contributor

Since we're talking about static pod, there's also an open issue to add more validation: #103587

@ajaysundark
Copy link

I can help take a look at this. Do we have a consensus on setting always as the default restartPolicy for static pods?

@GunaKKIBM
Copy link

/assign

@sftim
Copy link
Contributor

sftim commented Feb 21, 2025

I'm not sure about this.

Logically, if a container fails and that would trigger a non-static Pod to terminate, I think the kubelet should delete the whole Pod and immediately make a new one (including init container execution, reinitialization of sidecars, etc). The static pod declares desired state and we should honor it; in this case, without modification.

(IMO) the kubelet doesn't need to do an API server write to the mirror Pod before tearing down the Pod sandbox and making a fresh one, but make a fresh sandbox it should. For example, making that new sandbox could even trigger creation of a new microVM.

Handling failure in this way could also help a static Pod recover from a partial node failure, such as a CPU going offline when the container runtime is backed by a partitioning hypervisor. Unlikely, but there's nothing in our conformance testing to say "don't use a partitioning hypervisor" or indeed "don't keep running after a partial hardware failure".

Why wouldn't we do the Pod sandbox replacement as I've outlined?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Triage
Development

Successfully merging a pull request may close this issue.

6 participants