Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runsc: don't enable MADV_HUGEPAGE with direct compaction #11484

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Feb 20, 2025

runsc: don't enable MADV_HUGEPAGE with direct compaction

By default, private anonymous memory mappings in Linux use transparent
hugepages when available (/sys/kernel/mm/transparent_hugepage/enabled=always);
when hugepages are not available (typically due to fragmentation), such
mappings fall back to small pages instead
(/sys/kernel/mm/transparent_hugepage/defrag=madvise). See Linux's
Documentation/admin-guide/mm/transhuge.rst.

In gVisor, application private anonymous memory cannot generally be backed by
host private anonymous memory, since in many platforms, applications and the
sentry (which also needs to map application memory) run in different host
processes. Instead, application private anonymous memory is backed by a host
memfd, managed by pgalloc.MemoryFile. By default, Linux memfds never use
transparent hugepages
(/sys/kernel/mm/transparent_hugepage/shmem_enabled=never); thus
runsc/hostsettings sets shmem_enabled=madvise, allowing pgalloc to request
transparent hugepages using madvise(MADV_HUGEPAGE).

However, since the default value of /sys/kernel/mm/transparent_hugepage/defrag
is madvise, MADV_HUGEPAGE has the unintended side effect of enabling direct
compaction: when hugepages are not available, the host kernel will attempt to
form free hugepages by defragmenting small pages. This can be very expensive,
which is why (as described above) Linux doesn't do it by default.

Thus:

  • When MADV_HUGEPAGE is required for application THP, only enable it if
    /sys/kernel/mm/transparent_hugepage/defrag specifies an operating mode that
    does not result in more direct compaction than a normal private anonymous
    mapping would.

  • Adjust /sys/kernel/mm/transparent_hugepage/defrag accordingly in
    runsc/hostsettings.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb

@copybara-service copybara-service bot added the exported Issue was exported automatically label Feb 20, 2025
@copybara-service copybara-service bot force-pushed the test/cl728838338 branch 3 times, most recently from 0c3b3d5 to 7f39b8c Compare February 21, 2025 17:30
By default, private anonymous memory mappings in Linux use transparent
hugepages when available (/sys/kernel/mm/transparent_hugepage/enabled=always);
when hugepages are not available (typically due to fragmentation), such
mappings fall back to small pages instead
(/sys/kernel/mm/transparent_hugepage/defrag=madvise). See Linux's
Documentation/admin-guide/mm/transhuge.rst.

In gVisor, application private anonymous memory cannot generally be backed by
host private anonymous memory, since in many platforms, applications and the
sentry (which also needs to map application memory) run in different host
processes. Instead, application private anonymous memory is backed by a host
memfd, managed by pgalloc.MemoryFile. By default, Linux memfds never use
transparent hugepages
(/sys/kernel/mm/transparent_hugepage/shmem_enabled=never); thus
runsc/hostsettings sets shmem_enabled=madvise, allowing pgalloc to request
transparent hugepages using madvise(MADV_HUGEPAGE).

However, since the default value of /sys/kernel/mm/transparent_hugepage/defrag
is madvise, MADV_HUGEPAGE has the unintended side effect of enabling direct
compaction: when hugepages are not available, the host kernel will attempt to
form free hugepages by defragmenting small pages. This can be very expensive,
which is why (as described above) Linux doesn't do it by default.

Thus:

- When MADV_HUGEPAGE is required for application THP, only enable it if
  /sys/kernel/mm/transparent_hugepage/defrag specifies an operating mode that
  does not result in more direct compaction than a normal private anonymous
  mapping would.

- Adjust /sys/kernel/mm/transparent_hugepage/defrag accordingly in
  runsc/hostsettings.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 728838338
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exported Issue was exported automatically
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant