Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guest process reads stale data from shared memory region instead of latest host updates #5037

Open
3 tasks
Nafug22 opened this issue Feb 10, 2025 · 3 comments
Open
3 tasks
Assignees
Labels
Status: Awaiting author Indicates that an issue or pull request requires author action

Comments

@Nafug22
Copy link

Nafug22 commented Feb 10, 2025

Describe the bug

The guest process continues to read stale data instead of the latest updates written by the host in the shared memory region.

To Reproduce

  1. Configure firecracker via config-file
{
      "drive_id": "shm",
      "is_root_device": false,
      "cache_type": "Writeback",
      "is_read_only": false,
      "path_on_host": "/dev/shm/shared_mem",
      "io_engine": "Sync"
}
  1. Establish communication via vsock.
  2. The guest sends a signal via vsock instructing the host to update values in the shared memory.
  3. Once the write operation is complete, the host signals the guest to read the updated data.

Expected behaviour

The guest process should read the updated data instead of a stale version.

Environment

  • Firecracker version: v1.10.1
  • Host and guest kernel versions: 6.8.0-52-generic for host; 4.14.174 for guest
  • Rootfs used:
  • Architecture: x86_64
  • Any other relevant software versions:

Additional context

I discovered that reopening the file descriptor resolves the issue. However, due to the associated overhead, I would prefer a solution that avoids reopening.

This issue may be related to the read() function caching the file upon opening, preventing it from detecting changes made by the host until the file is reopened.

I'm unsure if this behavior is intentional for isolation purposes. I attempted to use O_DIRECT as a custom flag and also tried lseek, but both resulted in the microVM shutting down.

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?
@Nafug22 Nafug22 closed this as completed Feb 10, 2025
@Nafug22 Nafug22 changed the title [Bug] Title Guest process reads stale data from shared memory region instead of latest host updates Feb 12, 2025
@Nafug22 Nafug22 reopened this Feb 12, 2025
@ShadowCurse
Copy link
Contributor

HI @Nafug22,
The reason why O_DIRECT or lseek don't work is because Firecracker applies seccomp filters to itself which prevent unknown flags/syscalls be used. If you want to test version without seccomp filters, you can modify firecracker/build.rs to always use unimplemented.json or just compile for the gnu target.

I discovered that reopening the file descriptor resolves the issue.

Where do you reopen it? Did you modify Firecracker to do it or do you reopen it in the guest process reading from the device?

Considering you are using a /dev/shm the host should not cache accesses as the /dev/shm is already in the RAM even without Firecracker using O_DIRECT flag. This also implies that Writeback cache type in the block config is not very useful as well.
If there is a caching somewhere it is most likely in the guest kernel. Can you try O_DIRECT in the guest process you are reading data into and see what happens?

Overall it seems you are trying to implement some version of pmem device replacement. Unfortunately Firecracker currently does not implement it, but if you want to experiment with pmem, cloud-hypervisor does implement one.

@ShadowCurse ShadowCurse added the Status: Awaiting author Indicates that an issue or pull request requires author action label Feb 19, 2025
@Nafug22
Copy link
Author

Nafug22 commented Feb 19, 2025

Thank you for your help!

After modifying firecracker/build.rs to use unimplemented.json, I can use O_DIRECT for the underlying file opened by the virtio block without causing a shutdown during execution. Previously, before attempting to modify the block custom flags in Firecracker, I tried using O_DIRECT and O_SYNC in the guest process, but it had no effect. I also tried madvice and clflush in the guest process to manually invalidate the cache, no effect. Now, I have enabled O_DIRECT and O_SYNC both in Firecracker and in the guest process, yet the issue persists. The data remained stale.

I reopen the file descriptor in the guest process without modifying Firecracker. This happens right before the guest process reads the data. The entire process is as follows:

  1. The guest sends a signal to the host through vsock.
  2. The host modifies the data in shared memory using memcpy (which is synchronous) and then sends a completion message.
  3. Upon receiving the completion message, the guest reopens the file descriptor and can then read the updated data as expected.

Without reopening the file descriptor in the third step, the guest continues to read stale data.

If this helps, the host actually cannot normally read data written by the guest. I need to call msync() in the guest process after the writes. Then the data can be flushed to the device, allowing the host to read the updated content. However, even if I use msync() or similar flush functions on the host, the guest process still reads stale data.

@ShadowCurse
Copy link
Contributor

Ok, interesting, are you lseeking in the guest as well? You mentioned lseek before, but I assumed you tried to add it to Firecracker. Reopening file just resets it's cursor, so if you update beginning of the shared memory after a first guest read, the cursor in the guest will be moved forward. If you read in the guest again, it will continue reading forward instead of reading from the beginning of the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting author Indicates that an issue or pull request requires author action
Projects
None yet
Development

No branches or pull requests

2 participants