Apparently the last time I wrote an update like this was 2021-08-22[1].
Recently I've been missing them a lot, because I don't think it's ever
been as easy to keep up with the progress being made as when I wrote
weekly updates. Hopefully this time I can avoid going into quite the
same level of detail, so they're more sustainable to keep writing.
[1]: https://spectrum-os.org/lists/archives/spectrum-discuss/20210822213036.6lx5…
So, here's what I did this week:
Automated testing
-----------------
For a long time I've had a branch on my computer that makes some
significant changes to how networking in Spectrum works to try to
resolve a longstanding race condition when starting VMs. I've found it
difficult to work on, because it's hard for me to be confident when
changing things around that I'm not going to subtly break something in a
different way. It's been clear to me for a long time that the obvious
answer to this is automated tests, but how they should work was a lot
less clear. I'd like to avoid having to make a special testable version
of the system, because I want to be testing the real thing. The aarch64
port I completed early this year has helped make the system just a bit
more flexible, and a couple of weeks ago it clicked for me that perhaps
it's time to make a serious attempt at this.
So far, I have a program that starts Spectrum in QEMU. This is an
unmodified Spectrum image, but the test uses systemd-stub's support for
appending to the kernel command line via through SMBIOS Type 11 strings[2]
to configure an extra serial console, which it then communicates with
over a socket. Because this console was configured on the kernel
command line, it has kernel logs printed to it, which could make trying
to read command output from it messy. To work around that, the only
interaction the test has with this console is to write a command into it
to start a shell on a second serial console that isn't a destination for
kernel logs, and then further interaction happens via that second
console. (I tried lowering the kernel loglevel, but it seems some
messages come through even when it's set to zero.)
The test program then uses the Spectrum host shell to mount a filesystem
on an attached disk image containing a custom VM configuration, import
that VM, and run it. The VM just runs netcat to try to connect a server
running on the host via the test program. If the host receives that
connection, the test will pass, and once I have that working I can
expand the test to make sure that it keeps working after restarting the
different VMs involved.
What's still to do is getting that server bit right — originally I
planned to use QEMU's guestfwd[3] feature to forward connections to a
specific IP address and port to an AF_UNIX socket, but that been broken
for a long time[4], so instead I'll just use an AF_INET socket on the
host. Since it has to be a non-loopback address, I also want the test
to run itself in a network namespace so I'm not interfering with
networking on the machine running it. Nothing that should be especially
difficult. An early experiment with this led me to discover that
Python's built-in socket library didn't expose the IP_FREEBIND socket
option, so I sent a quick PR to do that[5], which was merged.
[2]: https://www.freedesktop.org/software/systemd/man/latest/systemd-stub.html#S…
[3]: https://www.qemu.org/docs/master/system/invocation.html#hxtool-5
[4]: https://gitlab.com/qemu-project/qemu/-/issues/1835
[5]: https://github.com/python/cpython/pull/132998
User-mode networking for development
------------------------------------
When developing the built-in Spectrum VMs, I run them directly on my
NixOS development system. For the router VM in particular, it's
important to test that it still connects to the network. Until now,
that's only been possible with QEMU, which has built-in slirp-based
user-mode networking — other VMMs would require privileged operations on
the development system to have working networking, and it's important to
me that the Spectrum development environment does not require special
privileges. While I was working on the test and investigating QEMU's
guestfwd feature, I discovered that there's a new option for user-mode
networking: passt[6][7]. passt runs outside the VMM process, and
implements a vhost-user interface, meaning that it should be possible to
use from any VMM.
I've changed the Spectrum development environment to use it when running
VMs in Cloud Hypervisor[8], and also to run the router VM in Cloud
Hypervisor by default[9], which means it's now easier to test that VM
during development in an environment much more similar to the one it
will ultimately run in as part of the Spectrum system. I also tried
using passt with crosvm, but it didn't work due to crosvm's slightly
strange interpretation of the vhost-user protocol. I sent a couple of
patches[10][11] to crosvm to fix the incompatibility, which haven't been
accepted yet but have prompted things to move in the right direction. I
also sent a patch to passt itself to make it easier to use with an
inetd/UCSPI-style super-server like s6-ipcserver, but a day later it
still hasn't appeared in their list archive so I think it's probably
waiting for somebody to find it in the list spam filter.
[6]: https://passt.top/
[7]: https://www.qemu.org/docs/master/system/devices/net.html#using-passt-as-the…
[8]: https://spectrum-os.org/git/spectrum/commit/?id=aac74f6165740a6b041a7205ec8…
[9]: https://spectrum-os.org/git/spectrum/commit/?id=ca10038c080f396a6cf2649b726…
[10]: https://chromium-review.googlesource.com/c/crosvm/crosvm/+/6491650
[11]: https://chromium-review.googlesource.com/c/crosvm/crosvm/+/6491651
Nixpkgs
-------
I contributed a few fixes for build regressions that would get in the
way of updating Spectrum's pinned Nixpkgs[12][13][14], and I also added
debug symbols to the passt package[15].
[12]: https://github.com/NixOS/nixpkgs/pull/401563
[13]: https://github.com/NixOS/nixpkgs/pull/402027
[14]: https://github.com/NixOS/nixpkgs/pull/402055
[15]: https://github.com/NixOS/nixpkgs/pull/401971
That's it for this week. I'm at just over a thousand words, which is
more than I was aiming for but better than I remember a lot of the past
weekly updates being. Hopefully I'll be back with another update next
week. As a reminder, you can support my work by donating to Spectrum's
development through GitHub Sponsors[16] or Liberapay[17]. This helps a
lot, especially with ongoing maintenance (usually fixing upstream bugs
etc.) that is difficult to fund via a grant.
[16]: https://github.com/sponsors/alyssais/
[17]: https://liberapay.com/qyliss