Demi Marie Obenour <demiobenour@gmail.com> writes:
On 9/10/25 11:11, Alyssa Ross wrote:
This all sounds fine, BUT there are a couple of important things to bear in mind:
• This needs to be maintainable. I don't know how much code this is going to be our how complex it's going to be, but that this will be totally custom does make me a bit concerned.
This should not be too difficult. It's the same system calls used by container managers, so if there is a problem it should be possible to get help fairly easily. bubblewrap
bubblewrap? :)
• These services are part of our TCB anyway. Sandboxing only gets us defense in depth. With that in mind, it's basically never going to be worth adding sandboxing if it adds any amount of attack surface. One example of that would be user namespaces. They've been a consistent source of kernel security issues, and it might be better to turn them off entirely than to use them for sandboxing stuff that's trusted anyway.
Sandboxing virtiofsd is going to be really annoying and will definitely come at a performance cost. The most efficient way to use virtiofsd is to give it CAP_DAC_READ_SEARCH in the initial user namespace and delegate _all_ access control to it. This allows virtiofs to use open_by_handle_at() for all filesystem access. Unfortunately, this also allows virtiofsd to open any file on the filesystem, ignoring all discretionary access control checks. I don't think Landlock would work either. SELinux or SMACK might work, but using them is significantly more complicated.
If one wants to sandbox virtiofsd, one either needs to use --cache=never or run into an effective resource leak (https://gitlab.com/virtio-fs/virtiofsd/-/issues/194). My hope is that in the future the problem will be solved by DAX and an in-kernel shrinker that is aware of the host resources it is using. Denial of service would be prevented by cgroups on the host, addressing the objection mentioned in the issue comments.
Do we not trust virtiofsd's built-in sandboxing?