-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect unhandled reboots and require user intervention #22278
Detect unhandled reboots and require user intervention #22278
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Marking as No New Tests because we can't really reboot our CI VMs and we don't expose the path of the sentinel file anywhere in our public API. We'd have to run |
That would properly be a good case for the fedora test day test cases or so, easy enough to run mkdir "tmpdir" on persistent disk then run |
LGTM |
libpod/runtime.go
Outdated
@@ -624,6 +624,25 @@ func makeRuntime(ctx context.Context, runtime *Runtime) (retErr error) { | |||
} | |||
} | |||
|
|||
// Check current boot ID - will be written to the alive file. | |||
systemBootID, err := os.ReadFile("/proc/sys/kernel/random/boot_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this neeeds to be extracted to linux specific file as this file will not exists on freebsd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're OK, we don't actually report errors on this failing - we just don't proceed with comparing the boot ID. We could abstract it, but the only benefit would a few less syscalls on freebsd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would not result in errors but I think it would be better, we need to remember that this code must work even on freebsd. Maybe someone later decides hey this should handle the error and fail if we cannot read it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Podman needs to be able to detect when a system reboot occurs to do certain types of cleanup operation (for example, reset container states, clean up IPAM allocations, etc). our current method for this is a sentinel file on a tmpfs filesystem. The problem emerges that there is no directory that is guaranteed to be a tmpfs and is also guaranteed to be accessible to rootless users in the FHS. If the user has a systemd user session, we can depend on /run/user/$UID, but we can't reliably say that they do. This code will detect the no-tmpfs-but-reboot-occurred case by writing the current system boot ID to our tmpfs sentinel file when it is created, and checking that file every time Podman starts to make sure that the current boot ID matches the cached one in the sentinel file. If they don't match, a reboot occurred and the sentinel file was not on a tmpfs and thus survived. In that case, throw an error telling the user to remove certain directories (the ones that are supposed to be tmpfs), so we can proceed as expected. Signed-off-by: Matt Heon <[email protected]>
04fb932
to
3560ccd
Compare
/lgtm |
Podman needs to be able to detect when a system reboot occurs to do certain types of cleanup operation (for example, reset container states, clean up IPAM allocations, etc). our current method for this is a sentinel file on a tmpfs filesystem. The problem emerges that there is no directory that is guaranteed to be a tmpfs and is also guaranteed to be accessible to rootless users in the FHS. If the user has a systemd user session, we can depend on /run/user/$UID, but we can't reliably say that they do.
This code will detect the no-tmpfs-but-reboot-occurred case by writing the current system boot ID to our tmpfs sentinel file when it is created, and checking that file every time Podman starts to make sure that the current boot ID matches the cached one in the sentinel file. If they don't match, a reboot occurred and the sentinel file was not on a tmpfs and thus survived. In that case, throw an error telling the user to remove certain directories (the ones that are supposed to be tmpfs), so we can proceed as expected.
Does this PR introduce a user-facing change?