Re: Sandboxing strategy

10 Sep 2025

      Demi Marie Obenour <demiobenour@gmail.com> writes:
...
I was thinking about how to sandbox the various per-VM daemons
and came up with the following strategy:
- Each VM gets its own PID and mount namespace and set of user IDs.
Didn't you say to me we couldn't do PID namespaces without support from
s6?
...
- Mount namespace includes /proc, /sys, /dev, and the host rootfs.
- Each service gets its own /tmp and /dev/shm if they are needed at all.
Just a question: if we put services into cgroups, does use of tmpfs get
charged to the appropriate cgroup?
...
- virtiofsd gets r/w access to the VM private storage.
- IPC namespaces are irrelevant because the kernel is
  built without System V IPC or POSIX message queues.
- Sending signals between services in the namespace is blocked
  by Landlock.  Landlock also blocks ptrace() and other nastiness,
  as well as communication via abstract AF_UNIX sockets.
- Since AF_UNIX abstract sockets between services are blocked by
  Landlock and Spectrum builds without IP or even Ethernet on the
  host there is no need for network namespacing.
It doesn't currently, just to be clear.  (I'm still putting off using a
custom kernel config on the host until we have better tooling for
keeping up with Nixpkgs.)
...
- The sandbox manager is PID 1 in the VM's PID namespace.
  When s6 tells it to shut down, it tries to gracefully shut
  down the VM.  After a timeout or once the VM has shut down,
  it exits, and Linux automatically kills all the processes
  and cleans up the mount namespace.
- The sandbox manager uses prctl(PR_SET_PDEATHSIG) to ensure it
  dies if the parent s6 process dies.  This requires s6 to provide
  its own PID to avoid races, but that is easy to implement.
All of this behavior will be hard-coded into C and Rust source code,
so it will be vastly simpler than a generic program that must support
many use-cases.
This all sounds fine, BUT there are a couple of important things to bear
in mind:

 • This needs to be maintainable.  I don't know how much code this is
   going to be our how complex it's going to be, but that this will be
   totally custom does make me a bit concerned.

 • These services are part of our TCB anyway.  Sandboxing only gets us
   defense in depth.  With that in mind, it's basically never going to
   be worth adding sandboxing if it adds any amount of attack surface.
   One example of that would be user namespaces.  They've been a
   consistent source of kernel security issues, and it might be better
   to turn them off entirely than to use them for sandboxing stuff
   that's trusted anyway.

Re: Sandboxing strategy

Alyssa Ross