On 9/7/25 16:28, Yureka wrote:
On 9/7/25 21:25, Demi Marie Obenour wrote:
On 9/6/25 10:12, Yureka Lilian wrote:
The xdp-forwarder's purpose is implementing the functionality needed within the net-vm (a VM running the Linux drivers for any physical interfaces on the spectrum system).
In the future, the net-vm will load the included XDP programs on the passed-through physical interfaces as well as the downstream virtio interface going into the router (recognized by its special MAC address).
The net-vm needs to multiplex between the physical interfaces, as there might be several interfaces in the same IOMMU-group.
For this, the XDP program loaded on the physical interfaces (`prog_physical.o`) applies a VLAN tag corresponding to the interface id and redirects the packets to the router interface (identified by the `router_iface` bpf map). In the other direction the XDP program loaded on the router interface (`prog_router.o`) removes one layer of VLAN tagging and redirects the packets to the interface read from the VLAN tag.
The helper program `set_router_iface` is used to update the `router_iface` bpf map to point to the interface passed as argument to the program.
Signed-off-by: Yureka Lilian <yureka@cyberchaos.dev> --- pkgs/default.nix | 5 + tools/default.nix | 15 +- tools/meson.build | 5 + tools/meson_options.txt | 4 + tools/xdp-forwarder/include/parsing_helpers.h | 273 ++++++++++++++++++ tools/xdp-forwarder/include/rewrite_helpers.h | 145 ++++++++++ tools/xdp-forwarder/meson.build | 38 +++ tools/xdp-forwarder/prog_physical.c | 37 +++ tools/xdp-forwarder/prog_router.c | 43 +++ tools/xdp-forwarder/set_router_iface.c | 29 ++ 10 files changed, 591 insertions(+), 3 deletions(-) create mode 100644 tools/xdp-forwarder/include/parsing_helpers.h create mode 100644 tools/xdp-forwarder/include/rewrite_helpers.h create mode 100644 tools/xdp-forwarder/meson.build create mode 100644 tools/xdp-forwarder/prog_physical.c create mode 100644 tools/xdp-forwarder/prog_router.c create mode 100644 tools/xdp-forwarder/set_router_iface.c
[...]
diff --git a/tools/xdp-forwarder/include/parsing_helpers.h b/tools/xdp-forwarder/include/parsing_helpers.h new file mode 100644 index 0000000..3d240cd --- /dev/null +++ b/tools/xdp-forwarder/include/parsing_helpers.h @@ -0,0 +1,273 @@ +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ +/* Vendored from https://github.com/xdp-project/xdp-tutorial/blob/d3d3eed6ea9a63d1302bfa8b5a8... */
It looks like your mail client wrapped the lines. I demangled them in the C code.
[...] +/* + * struct vlan_hdr - vlan header + * @h_vlan_TCI: priority and VLAN ID + * @h_vlan_encapsulated_proto: packet type ID or len + */ +struct vlan_hdr { + __be16 h_vlan_TCI; + __be16 h_vlan_encapsulated_proto; +};
I think this (and all the subsequent structs) should be __attribute__((packed)) but am not sure. [...]
diff --git a/tools/xdp-forwarder/include/rewrite_helpers.h b/tools/xdp-forwarder/include/rewrite_helpers.h new file mode 100644 index 0000000..2deae9a --- /dev/null +++ b/tools/xdp-forwarder/include/rewrite_helpers.h @@ -0,0 +1,145 @@ +/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ +/* Vendored from https://github.com/xdp-project/xdp-tutorial/blob/d3d3eed6ea9a63d1302bfa8b5a8... */ +/* + * This file contains functions that are used in the packetXX XDP programs to + * manipulate on packets data. The functions are marked as __always_inline, and + * fully defined in this header file to be included in the BPF program. + */ + +#ifndef __REWRITE_HELPERS_H +#define __REWRITE_HELPERS_H + +#include <linux/bpf.h> +#include <linux/ip.h> +#include <linux/ipv6.h> +#include <linux/if_ether.h> + +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_endian.h> + +/* Pops the outermost VLAN tag off the packet. Returns the popped VLAN ID on + * success or negative errno on failure. + */ +static __always_inline int vlan_tag_pop(struct xdp_md *ctx, struct ethhdr *eth) +{ + void *data_end = (void *)(long)ctx->data_end; + struct ethhdr eth_cpy; + struct vlan_hdr *vlh; + __be16 h_proto; + int vlid; + + if (!proto_is_vlan(eth->h_proto)) + return -1; + + /* Careful with the parenthesis here */ + vlh = (void *)(eth + 1); + + /* Still need to do bounds checking */ + if (vlh + 1 > data_end) + return -1; + + /* Save vlan ID for returning, h_proto for updating Ethernet header */ + vlid = bpf_ntohs(vlh->h_vlan_TCI); + h_proto = vlh->h_vlan_encapsulated_proto; + + /* Make a copy of the outer Ethernet header before we cut it off */ + __builtin_memcpy(ð_cpy, eth, sizeof(eth_cpy)); + + /* Actually adjust the head pointer */ + if (bpf_xdp_adjust_head(ctx, (int)sizeof(*vlh))) + return -1; + + /* Need to re-evaluate data *and* data_end and do new bounds checking + * after adjusting head + */ + eth = (void *)(long)ctx->data; + data_end = (void *)(long)ctx->data_end; + if (eth + 1 > data_end) + return -1; + + /* Copy back the old Ethernet header and update the proto type */ + __builtin_memcpy(eth, ð_cpy, sizeof(*eth)); + eth->h_proto = h_proto; + + return vlid; +} In Spectrum, eth is always the packet data, so this should be able to be simplified to: [...]
Not upstreamable
I agree.
[...]
In the Spectrum use-case (where the Ethernet header is always the one in the packet) this should be able to be simplified to (not tested): [...]
Not upstreamable
That makes sense.
This also avoids needing the 'eth' parameter, as it is now assigned in the function body before being read from. I'm not sure this would be you're not sure this would be what?
Should have added "upstreamable".
[...]
+#endif /* __REWRITE_HELPERS_H */ diff --git a/tools/xdp-forwarder/meson.build b/tools/xdp-forwarder/meson.build new file mode 100644 index 0000000..7e60c11 --- /dev/null +++ b/tools/xdp-forwarder/meson.build @@ -0,0 +1,38 @@ +# SPDX-License-Identifier: EUPL-1.2+ +# SPDX-FileCopyrightText: 2025 Yureka Lilian <yureka@cyberchaos.dev> + +libbpf = dependency('libbpf', version : '1.6.2') + +executable('set_router_iface', 'set_router_iface.c', + dependencies : libbpf, + install : true) + +clang = find_program('clang') + +bpf_o_cmd = [ + clang.full_path(), + '-fno-stack-protector', + '-O2', + '-target', 'bpf', + '-I', meson.current_source_dir() + '/include', + '-g', + '-c', + '@INPUT@', + '-o', + '@OUTPUT@', +] I recommend adding -fno-strict-overflow and -fno-strict-aliasing here. I don't know if those are implied by -target bpf, but better safe than sorry. Will do.
+prog_router_o = custom_target( + input : 'prog_router.c', + output : 'prog_router.o', + command : bpf_o_cmd, + install: true, + install_dir: 'lib/xdp') + +prog_physical_o = custom_target( + input : 'prog_physical.c', + output : 'prog_physical.o', + command : bpf_o_cmd, + install: true, + install_dir: 'lib/xdp') Meson allows providing a dependency file so that the target is remade if any of the headers are changed. Clang is able to produce such a file (-MD -MP -MF file).
I mostly got inspiration from the systemd meson files. I'm not very familiar with meson, but I will try to add this.
I think bpf_o_cmd = [ clang.full_path(), '-fno-stack-protector', '-fno-strict-aliasing', '-fno-strict-overflow', '-Wall', '-Wextra', '-O2', '-target', 'bpf', '-I', meson.current_source_dir() + '/include', '-g', '-c', '-o', '@OUTPUT@', '-MD', '-MP', '-MF', '@DEPFILE@', '--', '@INPUT@', ] prog_physical_o = custom_target( input: 'prog_physical.c', output: 'prog_physical.o', depfile: 'prog_physical.o.dep', command: bpf_o_cmd, install: true, install_dir: 'lib/xdp') should work. '--' is optional, of course; I am in the habit of adding it because it does matter in other contexts. This is also untested, but it is consistent with the Meson documentation (https://mesonbuild.com/Reference-manual_functions.html#custom_target). The above command includes the extra command-line options I mentioned (-Wall -Wextra -fno-strict-overflow -fno-strict-aliasing).
[...]
+SEC("xdp") +int physical(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; Would it make sense to use char * here? I know that Linux uses arithmetic on void *, but it makes the it makes the [... what]?
Whoops, forgot to finish the sentence. It makes the code easier to understand for those familiar with userspace programming.
+ struct hdr_cursor nh; + nh.pos = data; + + struct ethhdr *eth; + if (parse_ethhdr(&nh, data_end, ð) < 0) + return XDP_DROP; + + + if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > 4096) + return XDP_DROP;
I think this is off-by-1 and should be: if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > VLAN_VID_MASK) return XDP_DROP;
+ vlan_tag_push(ctx, eth, ctx->ingress_ifindex); Looks like a missing check for the return value. Thanks, will fix.
You're welcome.
[...]
+ int vlid = vlan_tag_pop(ctx, eth); + if (vlid < 0) { + return XDP_DROP; + } I don’t think Spectrum uses bits other than the VLAN ID, and VLAN 0 is reserved, so perhaps:
if (vlid < 1 || vlid > 4096) { return XDP_DROP; } The "< 0" check is strictly an error check. In success cases, vlan_tag_pop returns exactly the VLAN ID.
Makes sense. A check for the VLAN ID being valid could be done but would be separate (I think `(vlid & VLAN_VID_MASK) != 0`).
[...]
I simplified the code into the following (only compile-tested):
/* SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-clause) */ /* * This file contains parsing functions that can be used in eXDP programs. The * functions are marked as __always_inline, and fully defined in this header * file to be included in the BPF program. * * Each helper parses a packet header, including doing bounds checking, and * returns the type of its contents if successful, and -1 otherwise. * * For Ethernet and IP headers, the content type is the type of the payload * (h_proto for Ethernet, nexthdr for IPv6), for ICMP it is the ICMP type field. * All return values are in host byte order. */
#include <stddef.h> #include <linux/if_ether.h> #include <linux/if_packet.h> #include <linux/bpf.h> #include <bpf/bpf_endian.h> #include <bpf/bpf_helpers.h>
/* * struct vlan_hdr - vlan header * @h_vlan_TCI: priority and VLAN ID * @h_vlan_encapsulated_proto: packet type ID or len */ struct vlan_hdr { __be16 h_vlan_TCI; __be16 h_vlan_encapsulated_proto; } __attribute__((__packed__));
struct eth_vlan_hdr { struct ethhdr eth; struct vlan_hdr vlan; } __attribute__((__packed__));
static __always_inline int proto_is_vlan(__u16 h_proto) { return !!(h_proto == bpf_htons(ETH_P_8021Q) || h_proto == bpf_htons(ETH_P_8021AD)); }
/* Pushes a new VLAN tag after the Ethernet header. Returns 0 on success, * -1 on failure. */ static __always_inline int vlan_tag_push(struct xdp_md *ctx, int vlid) { struct eth_vlan_hdr *hdr; char *data_end; char *data;
/* Check for valid Ethernet frame */ if ((unsigned long)ctx->data + ETH_ZLEN > (unsigned long)ctx->data_end) return -1;
/* Add space in front of the packet */ if (bpf_xdp_adjust_head(ctx, -(int)sizeof(struct vlan_hdr))) return -1;
/* Must re-fetch header pointers */ data_end = (char *)(long)ctx->data_end; data = (char *)(long)ctx->data;
/* Verifier requires this (redundant) bounds check */ if (data + ETH_ZLEN + sizeof(struct vlan_hdr) > data_end) return -1;
/* Move source and destination MAC addresses, populate VLAN tag * with ID and proto, and set outer Ethernet header to VLAN type. */ __builtin_memmove(data, data + sizeof(struct vlan_hdr), offsetof(struct ethhdr, h_proto)); hdr = (struct eth_vlan_hdr *)data; /* TODO: should this be ETH_P_8021AD? */ hdr->eth.h_proto = bpf_htons(ETH_P_8021Q); hdr->vlan.h_vlan_TCI = bpf_htons(vlid); return 0; }
/* Pops the outermost VLAN tag off the packet. Returns the popped VLAN ID on * success or negative errno on failure. */ static __always_inline int vlan_tag_pop(struct xdp_md *ctx) {
char *data_end = (void *)(long)ctx->data_end; struct ethhdr *eth = (void *)(long)ctx->data; struct vlan_hdr *vlh; int vlid;
/* Check that the produced Ethernet frame will be long enough */ if ((unsigned long)eth + ETH_ZLEN + sizeof(*vlh) > (unsigned long)data_end) return -1;
/* Careful with the parenthesis here */ vlh = (void *)(eth + 1);
if (!proto_is_vlan(eth->h_proto)) return -1;
/* Save vlan ID for returning */ vlid = bpf_ntohs(vlh->h_vlan_TCI);
/* Move the source and destination MAC addresses. * This moves the Ethernet header by 4 bytes, so * be sure to cast to char * before doing pointer * arithmetic. */ __builtin_memmove((char *)eth + sizeof(*vlh), eth, offsetof(struct ethhdr, h_proto));
/* Actually adjust the head pointer */ if (bpf_xdp_adjust_head(ctx, (int)sizeof(*vlh))) return -1;
return vlid; }
// The maps are actually not used by this program, but just included // to keep the reference-counted pin alive before any physical interfaces // are added. struct { __uint(type, BPF_MAP_TYPE_DEVMAP); __type(key, int); __type(value, int); __uint(max_entries, 1); __uint(pinning, LIBBPF_PIN_BY_NAME); } physical_iface SEC(".maps");
struct { __uint(type, BPF_MAP_TYPE_DEVMAP); __type(key, int); __type(value, int); __uint(max_entries, 1); __uint(pinning, LIBBPF_PIN_BY_NAME); } router_iface SEC(".maps");
SEC("xdp") int physical(struct xdp_md *ctx) { if (ctx->ingress_ifindex < 1 || ctx->ingress_ifindex > 4096) return XDP_DROP;
if (vlan_tag_push(ctx, ctx->ingress_ifindex)) return XDP_DROP;
return bpf_redirect_map(&physical_iface, 0, 0); }
SEC("xdp") int router(struct xdp_md *ctx) { int vlid = vlan_tag_pop(ctx);
/* Negative VLAN IDs indicate errors, and 0 is reserved and invalid. * Values above 4096 indicate that the priority field is used, which * isn't suppported out of simplicity. */ if (vlid < 1 || vlid > 4096) return XDP_DROP;
return bpf_redirect(vlid, 0); }
I'm not sure what you intend with this. If there is no significant flaws in the upstream code, why should we write our own version?
Valid point. I do think it is safe to set VLAN_MAX_DEPTH to 0, as Spectrum doesn't parse VLAN tags. -- Sincerely, Demi Marie Obenour (she/her/hers)