Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Retry KVM_CREATE_VM on EINTR #5046

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zulinx86
Copy link
Contributor

@zulinx86 zulinx86 commented Feb 21, 2025

Reason & Changes

It is known that KVM_CREATE_VM occasionally fails with EINTR on heavily loaded machines
with many VMs.

The behavior itself that KVM_CREATE_VM can return EINTR is intentional. This is because
the KVM_CREATE_VM path includes mm_take_all_locks() that is CPU intensive and all CPU
intensive syscalls should check for pending signals and return EINTR immediately to allow
userland to remain interactive.
https://lists.nongnu.org/archive/html/qemu-devel/2014-01/msg01740.html

However, it is empirically confirmed that, even though there is no pending signal,
KVM_CREATE_VM returns EINTR.
https://lore.kernel.org/qemu-devel/[email protected]/

To mitigate it, QEMU does an inifinite retry on EINTR that greatly improves reliabiliy:

Similarly, we do retries up to 5 times. Although Firecracker clients are also able to
retry, they have to start Firecracker from scratch. Doing retries in Firecracker makes
recovery faster and improves reliability.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • [ ] I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • [ ] I have mentioned all user-facing changes in CHANGELOG.md.
  • [ ] If a specific issue led to this PR, this PR closes the issue.
  • [ ] When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

The error variant is used when creating a new VM from KVM FD via
KVM_CREATE_VM call. On success, VM FD is returned. The previous error
message doesn't tell a lie but is a bit unintuitive.

Signed-off-by: Takahiro Itazuri <[email protected]>
@zulinx86 zulinx86 self-assigned this Feb 21, 2025
@zulinx86 zulinx86 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Feb 21, 2025
@zulinx86
Copy link
Contributor Author

zulinx86 commented Feb 21, 2025

Question:

  • Do we want to add TODO comment and track it as an issue?
  • Should the log level be "warn"?
  • Do we want to have a metric to gather data of how often it fails with EINTR? (maybe no, because its removal is going to be a breaking change).

@zulinx86 zulinx86 force-pushed the EINTR_on_KVM_CREATE_VM branch from f292393 to dd04187 Compare February 21, 2025 13:57
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 58.33333% with 5 lines in your changes missing coverage. Please review.

Project coverage is 83.19%. Comparing base (26a3a1e) to head (be26374).

Files with missing lines Patch % Lines
src/vmm/src/vstate/vm/mod.rs 50.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5046      +/-   ##
==========================================
- Coverage   83.20%   83.19%   -0.02%     
==========================================
  Files         247      247              
  Lines       26631    26641      +10     
==========================================
+ Hits        22158    22163       +5     
- Misses       4473     4478       +5     
Flag Coverage Δ
5.10-c5n.metal 83.67% <54.54%> (-0.02%) ⬇️
5.10-m5n.metal 83.66% <54.54%> (-0.02%) ⬇️
5.10-m6a.metal 82.86% <54.54%> (-0.02%) ⬇️
5.10-m6g.metal 79.66% <54.54%> (-0.02%) ⬇️
5.10-m6i.metal 83.65% <54.54%> (-0.02%) ⬇️
5.10-m7g.metal 79.66% <54.54%> (-0.02%) ⬇️
6.1-c5n.metal 83.67% <54.54%> (-0.01%) ⬇️
6.1-m5n.metal 83.65% <54.54%> (-0.03%) ⬇️
6.1-m6a.metal 82.86% <54.54%> (-0.02%) ⬇️
6.1-m6g.metal 79.65% <54.54%> (-0.02%) ⬇️
6.1-m6i.metal 83.65% <54.54%> (-0.02%) ⬇️
6.1-m7g.metal 79.66% <54.54%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zulinx86 zulinx86 force-pushed the EINTR_on_KVM_CREATE_VM branch from dd04187 to 52127c1 Compare February 21, 2025 14:07
@zulinx86 zulinx86 force-pushed the EINTR_on_KVM_CREATE_VM branch 2 times, most recently from 028f3f6 to fb23c4f Compare February 21, 2025 14:41
It is known that KVM_CREATE_VM fails with EINTR on heavily loaded
machines with many VMs. It might be a kernel bug but apparently has not
been fixed. To mitigate it, QEMU does an infinitely retry on EINTR.
Similar, do retries up to 5 times.

Signed-off-by: Takahiro Itazuri <[email protected]>
@zulinx86 zulinx86 force-pushed the EINTR_on_KVM_CREATE_VM branch from fb23c4f to be26374 Compare February 21, 2025 15:52
Ok(fd) => return Ok(fd),
Err(e) if e.errno() == libc::EINTR && attempt < MAX_ATTEMPTS => {
info!("Attemp #{attempt} of KVM_CREATE_VM returned EINTR");
// Exponential backoff (1us, 2us, 4us, and 8us => 15us in total)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be 16us in the end? 2^4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleep duration is 2^(attempt - 1). The attempt starts from 1 to 5, but if attempt == 5 it goes to the third hand of the match expression. So 2^(1 - 1) = 2^0 = 1, 2^(2 - 1) = 2^1 = 2, 2^(3 - 1) = 2^2 = 4, and 2^(4 - 1) = 2^3 = 8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it is the sum.
Btw, why do we even need the backoff?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants