Detect unhandled reboots and require user intervention #22278

mheon · 2024-04-05T13:07:54Z

Podman needs to be able to detect when a system reboot occurs to do certain types of cleanup operation (for example, reset container states, clean up IPAM allocations, etc). our current method for this is a sentinel file on a tmpfs filesystem. The problem emerges that there is no directory that is guaranteed to be a tmpfs and is also guaranteed to be accessible to rootless users in the FHS. If the user has a systemd user session, we can depend on /run/user/$UID, but we can't reliably say that they do.

This code will detect the no-tmpfs-but-reboot-occurred case by writing the current system boot ID to our tmpfs sentinel file when it is created, and checking that file every time Podman starts to make sure that the current boot ID matches the cached one in the sentinel file. If they don't match, a reboot occurred and the sentinel file was not on a tmpfs and thus survived. In that case, throw an error telling the user to remove certain directories (the ones that are supposed to be tmpfs), so we can proceed as expected.

Does this PR introduce a user-facing change?

Podman now detects unhandled system reboots and advises the user on proper mitigations.

openshift-ci · 2024-04-05T13:08:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2024-04-05T13:09:40Z

Marking as No New Tests because we can't really reboot our CI VMs and we don't expose the path of the sentinel file anywhere in our public API. We'd have to run podman info --log-level=debug, parse tmpdir out of the debug output (ew), and then hard-code what the filename is supposed to be in the tests.

Luap99 · 2024-04-05T13:13:07Z

Marking as No New Tests because we can't really reboot our CI VMs and we don't expose the path of the sentinel file anywhere in our public API. We'd have to run podman info --log-level=debug, parse tmpdir out of the debug output (ew), and then hard-code what the filename is supposed to be in the tests.

That would properly be a good case for the fedora test day test cases or so, easy enough to run mkdir "tmpdir" on persistent disk then run podman --root tmpdir/root --runroot tmpdir/runroot --tmpdir tmpdir/tmp info then have them reboot and run that again to confirm it fails with the new error.

baude · 2024-04-05T13:13:33Z

LGTM

Luap99 · 2024-04-05T13:16:00Z

libpod/runtime.go

@@ -624,6 +624,25 @@ func makeRuntime(ctx context.Context, runtime *Runtime) (retErr error) {
 		}
 	}

+	// Check current boot ID - will be written to the alive file.
+	systemBootID, err := os.ReadFile("/proc/sys/kernel/random/boot_id")


I think this neeeds to be extracted to linux specific file as this file will not exists on freebsd

I think we're OK, we don't actually report errors on this failing - we just don't proceed with comparing the boot ID. We could abstract it, but the only benefit would a few less syscalls on freebsd.

yeah it would not result in errors but I think it would be better, we need to remember that this code must work even on freebsd. Maybe someone later decides hey this should handle the error and fail if we cannot read it...

Podman needs to be able to detect when a system reboot occurs to do certain types of cleanup operation (for example, reset container states, clean up IPAM allocations, etc). our current method for this is a sentinel file on a tmpfs filesystem. The problem emerges that there is no directory that is guaranteed to be a tmpfs and is also guaranteed to be accessible to rootless users in the FHS. If the user has a systemd user session, we can depend on /run/user/$UID, but we can't reliably say that they do. This code will detect the no-tmpfs-but-reboot-occurred case by writing the current system boot ID to our tmpfs sentinel file when it is created, and checking that file every time Podman starts to make sure that the current boot ID matches the cached one in the sentinel file. If they don't match, a reboot occurred and the sentinel file was not on a tmpfs and thus survived. In that case, throw an error telling the user to remove certain directories (the ones that are supposed to be tmpfs), so we can proceed as expected. Signed-off-by: Matt Heon <[email protected]>

baude · 2024-04-05T15:36:59Z

/lgtm

openshift-ci bot added the release-note label Apr 5, 2024

mheon added No New Tests Allow PR to proceed without adding regression tests and removed release-note labels Apr 5, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2024

Luap99 reviewed Apr 5, 2024

View reviewed changes

mheon force-pushed the error_on_unhandled_reboot branch from 04fb932 to 3560ccd Compare April 5, 2024 14:07

openshift-ci bot assigned baude Apr 5, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 5, 2024

openshift-merge-bot bot merged commit 41a710b into containers:main Apr 5, 2024
94 checks passed

Luap99 mentioned this pull request May 13, 2024

Podman should detect and warn when used with su/sudo #14603

Closed

stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jul 5, 2024

stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect unhandled reboots and require user intervention #22278

Detect unhandled reboots and require user intervention #22278

mheon commented Apr 5, 2024

openshift-ci bot commented Apr 5, 2024

mheon commented Apr 5, 2024

Luap99 commented Apr 5, 2024

baude commented Apr 5, 2024

Luap99 Apr 5, 2024

mheon Apr 5, 2024

Luap99 Apr 5, 2024

mheon Apr 5, 2024

mheon Apr 5, 2024

baude commented Apr 5, 2024

Detect unhandled reboots and require user intervention #22278

Detect unhandled reboots and require user intervention #22278

Conversation

mheon commented Apr 5, 2024

Does this PR introduce a user-facing change?

openshift-ci bot commented Apr 5, 2024

mheon commented Apr 5, 2024

Luap99 commented Apr 5, 2024

baude commented Apr 5, 2024

Luap99 Apr 5, 2024

Choose a reason for hiding this comment

mheon Apr 5, 2024

Choose a reason for hiding this comment

Luap99 Apr 5, 2024

Choose a reason for hiding this comment

mheon Apr 5, 2024

Choose a reason for hiding this comment

mheon Apr 5, 2024

Choose a reason for hiding this comment

baude commented Apr 5, 2024