Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shim: add support for containerd v2 metrics #11473

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Champ-Goblem
Copy link
Contributor

Add support for v2 containerd metrics in the shim, v2 metrics are only used when runsc is run with --system-cgroup=true. Containerd requires v2 metrics when the host is run with CGroupsV2. This issue was noticed when attempting to gather metrics on AL2023 which defaults to CGroupsV2.

Fixes: #11472

@Champ-Goblem
Copy link
Contributor Author

Testing this change on AWS AL2022 vs AL2023:

image

Commands used for testing:

CPU: stress-ng -c 1
Memory: stress-ng --vm 1 --vm-bytes 100M

The difference between runC and gVisor might be a separate issue.

@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 13, 2025

Thanks for the PR!

@milantracy
Copy link
Contributor

thanks for the patch! LGTM

@Champ-Goblem Champ-Goblem force-pushed the shim-add-cgroup-v2-metrics-support branch from ec422c0 to b3701c8 Compare February 17, 2025 09:36
milantracy pushed a commit to milantracy/gvisor that referenced this pull request Feb 19, 2025
DO NOT SUBMIT

Signed-off-by: Champ-Goblem <[email protected]>
@@ -31,6 +31,7 @@ go_library(
"@com_github_containerd_cgroups//:go_default_library",
"@com_github_containerd_cgroups//stats/v1:go_default_library",
"@com_github_containerd_cgroups//v2:go_default_library",
"@com_github_containerd_cgroups//v2/stats:go_default_library",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our internal pipeline is missing the rule to translate the package, it blocks the submission.

i will fix that in our pipeline

Add support for v2 containerd metrics in the shim, v2 metrics are only used when runsc is run with --system-cgroup=true.
Containerd requires v2 metrics when the host is run with CGroupsV2.
This issue was noticed when attempting to gather metrics on AL2023 which defaults to CGroupsV2.

Fixes: google#11472
Signed-off-by: Champ-Goblem <[email protected]>
@Champ-Goblem Champ-Goblem force-pushed the shim-add-cgroup-v2-metrics-support branch from b3701c8 to b602afb Compare February 19, 2025 11:39
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
Add support for v2 containerd metrics in the shim, v2 metrics are only used when runsc is run with --system-cgroup=true. Containerd requires v2 metrics when the host is run with CGroupsV2. This issue was noticed when attempting to gather metrics on AL2023 which defaults to CGroupsV2.

Fixes: #11472
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729301568
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
Add support for v2 containerd metrics in the shim, v2 metrics are only used when runsc is run with --system-cgroup=true. Containerd requires v2 metrics when the host is run with CGroupsV2. This issue was noticed when attempting to gather metrics on AL2023 which defaults to CGroupsV2.

Fixes: #11472
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729301568
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
Add support for v2 containerd metrics in the shim, v2 metrics are only used when runsc is run with --system-cgroup=true. Containerd requires v2 metrics when the host is run with CGroupsV2. This issue was noticed when attempting to gather metrics on AL2023 which defaults to CGroupsV2.

Fixes: #11472
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729301568
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 725052687
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 725052687
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 725052687
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 725052687
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
By default, private anonymous memory mappings in Linux use transparent
hugepages when available (/sys/kernel/mm/transparent_hugepage/enabled=always);
when hugepages are not available (typically due to fragmentation), such
mappings fall back to small pages instead
(/sys/kernel/mm/transparent_hugepage/defrag=madvise). See Linux's
Documentation/admin-guide/mm/transhuge.rst.

In gVisor, application private anonymous memory cannot generally be backed by
host private anonymous memory, since in many platforms, applications and the
sentry (which also needs to map application memory) run in different host
processes. Instead, application private anonymous memory is backed by a host
memfd, managed by pgalloc.MemoryFile. By default, Linux memfds never use
transparent hugepages
(/sys/kernel/mm/transparent_hugepage/shmem_enabled=never); thus
runsc/hostsettings sets shmem_enabled=madvise, allowing pgalloc to request
transparent hugepages using madvise(MADV_HUGEPAGE).

However, since the default value of /sys/kernel/mm/transparent_hugepage/defrag
is madvise, MADV_HUGEPAGE has the unintended side effect of enabling direct
compaction: when hugepages are not available, the host kernel will attempt to
form free hugepages by defragmenting small pages. This can be very expensive,
which is why (as described above) Linux doesn't do it by default.

Thus:

- When MADV_HUGEPAGE is required for application THP, only enable it if
  /sys/kernel/mm/transparent_hugepage/defrag specifies an operating mode that
  does not result in more direct compaction than a normal private anonymous
  mapping would.

- Adjust /sys/kernel/mm/transparent_hugepage/defrag accordingly in
  runsc/hostsettings.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 728838338
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729612115
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729612115
copybara-service bot pushed a commit that referenced this pull request Feb 21, 2025
By default, private anonymous memory mappings in Linux use transparent
hugepages when available (/sys/kernel/mm/transparent_hugepage/enabled=always);
when hugepages are not available (typically due to fragmentation), such
mappings fall back to small pages instead
(/sys/kernel/mm/transparent_hugepage/defrag=madvise). See Linux's
Documentation/admin-guide/mm/transhuge.rst.

In gVisor, application private anonymous memory cannot generally be backed by
host private anonymous memory, since in many platforms, applications and the
sentry (which also needs to map application memory) run in different host
processes. Instead, application private anonymous memory is backed by a host
memfd, managed by pgalloc.MemoryFile. By default, Linux memfds never use
transparent hugepages
(/sys/kernel/mm/transparent_hugepage/shmem_enabled=never); thus
runsc/hostsettings sets shmem_enabled=madvise, allowing pgalloc to request
transparent hugepages using madvise(MADV_HUGEPAGE).

However, since the default value of /sys/kernel/mm/transparent_hugepage/defrag
is madvise, MADV_HUGEPAGE has the unintended side effect of enabling direct
compaction: when hugepages are not available, the host kernel will attempt to
form free hugepages by defragmenting small pages. This can be very expensive,
which is why (as described above) Linux doesn't do it by default.

Thus:

- When MADV_HUGEPAGE is required for application THP, only enable it if
  /sys/kernel/mm/transparent_hugepage/defrag specifies an operating mode that
  does not result in more direct compaction than a normal private anonymous
  mapping would.

- Adjust /sys/kernel/mm/transparent_hugepage/defrag accordingly in
  runsc/hostsettings.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 728838338
copybara-service bot pushed a commit that referenced this pull request Feb 22, 2025
FUTURE_COPYBARA_INTEGRATE_REVIEW=#11473 from Champ-Goblem:shim-add-cgroup-v2-metrics-support b602afb
PiperOrigin-RevId: 729612115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shim: Add support for cgroups v2 stats
3 participants