[PATCH] release/checks/integration: handle partial sends
Sometimes we see a test failure that looks like this: unexpected connection data: qemu-system-aarch64: terminating on signal 15 from pid 184 () I think this is because, now that the nc command in the networking integration test has a timeout, it's possible for it to time out after having opened the connection, but before having written all its input to it. Therefore, ignore connections that send a prefix of the expected data (including nothing), and just wait for the next connection rather than failing if that happens. Closes: https://spectrum-os.org/lists/archives/spectrum-devel/875xbl33vw.fsf@alyssa.... Fixes: c61b297 ("release/checks/integration: add nc timeout") Signed-off-by: Alyssa Ross <hi@alyssa.is> --- Please test! It's difficult to know if I've solved the problem for real, since it happens transiently. release/checks/integration/networking.c | 49 +++++++++++++++---------- 1 file changed, 30 insertions(+), 19 deletions(-) diff --git a/release/checks/integration/networking.c b/release/checks/integration/networking.c index 97d7895..a445af1 100644 --- a/release/checks/integration/networking.c +++ b/release/checks/integration/networking.c @@ -57,32 +57,43 @@ static int setup_server(void) static void expect_connection(int listener) { - int conn_fd; - FILE *conn; + int conn, r; char msg[7]; size_t len; - fputs("waiting for server connection\n", stderr); - if ((conn_fd = accept(listener, nullptr, nullptr)) == -1) { - perror("accept"); - exit(EXIT_FAILURE); - } - fputs("accepted connection!\n", stderr); - if (!(conn = fdopen(conn_fd, "r"))) { - perror("fdopen(server connection)"); - exit(EXIT_FAILURE); - } + for (;;) { + len = 0; - len = fread(msg, 1, sizeof msg, conn); - if (len != 6 || memcmp("hello\n", msg, 6)) { - if (ferror(conn)) - perror("fread(server connection)"); - else + fputs("waiting for server connection\n", stderr); + if ((conn = accept(listener, nullptr, nullptr)) == -1) { + perror("accept"); + exit(EXIT_FAILURE); + } + fputs("accepted connection!\n", stderr); + + for (;;) { + r = read(conn, msg + len, sizeof msg - len); + if (r == -1) { + perror("read"); + exit(EXIT_FAILURE); + } + if (r == 0) + break; + len += r; + } + + if (memcmp("hello\n", msg, len) || len > 6) { fprintf(stderr, "unexpected connection data: %.*s", (int)len, msg); - exit(EXIT_FAILURE); + exit(EXIT_FAILURE); + } + + // If connection was disconnect partway through, try again. + if (len < 6) + continue; + + return; } - fclose(conn); } static void drain_connections(int listener) base-commit: 067c6a5d50971242f9cb8ac0ac76e20d88a9b5c1 -- 2.51.0
Thanks for looking into this. On 11/29/25 17:35, Alyssa Ross wrote:
Sometimes we see a test failure that looks like this:
unexpected connection data: qemu-system-aarch64: terminating on signal 15 from pid 184 ()
I think this is because, now that the nc command in the networking integration test has a timeout, it's possible for it to time out after having opened the connection, but before having written all its input to it. Therefore, ignore connections that send a prefix of the expected data (including nothing), and just wait for the next connection rather than failing if that happens.
Closes: https://spectrum-os.org/lists/archives/spectrum-devel/875xbl33vw.fsf@alyssa.... Fixes: c61b297 ("release/checks/integration: add nc timeout") Signed-off-by: Alyssa Ross <hi@alyssa.is> --- Please test! It's difficult to know if I've solved the problem for real, since it happens transiently.
release/checks/integration/networking.c | 49 +++++++++++++++---------- 1 file changed, 30 insertions(+), 19 deletions(-)
diff --git a/release/checks/integration/networking.c b/release/checks/integration/networking.c index 97d7895..a445af1 100644 --- a/release/checks/integration/networking.c +++ b/release/checks/integration/networking.c @@ -57,32 +57,43 @@ static int setup_server(void) static void expect_connection(int listener) { - int conn_fd; - FILE *conn; + int conn, r; char msg[7]; size_t len; - fputs("waiting for server connection\n", stderr); - if ((conn_fd = accept(listener, nullptr, nullptr)) == -1) { - perror("accept"); - exit(EXIT_FAILURE); - } - fputs("accepted connection!\n", stderr); - if (!(conn = fdopen(conn_fd, "r"))) { - perror("fdopen(server connection)"); - exit(EXIT_FAILURE); - } + for (;;) { + len = 0; - len = fread(msg, 1, sizeof msg, conn); - if (len != 6 || memcmp("hello\n", msg, 6)) { - if (ferror(conn)) - perror("fread(server connection)"); - else + fputs("waiting for server connection\n", stderr); + if ((conn = accept(listener, nullptr, nullptr)) == -1) { + perror("accept"); + exit(EXIT_FAILURE); + } + fputs("accepted connection!\n", stderr); + + for (;;) { + r = read(conn, msg + len, sizeof msg - len); + if (r == -1) { + perror("read"); + exit(EXIT_FAILURE); + } + if (r == 0) + break; + len += r; + } + + if (memcmp("hello\n", msg, len) || len > 6) { fprintf(stderr, "unexpected connection data: %.*s", (int)len, msg); - exit(EXIT_FAILURE); + exit(EXIT_FAILURE); + } + + // If connection was disconnect partway through, try again. + if (len < 6) + continue; + + return; } - fclose(conn);
Is it intentional that the connection is never closed?
} static void drain_connections(int listener)
base-commit: 067c6a5d50971242f9cb8ac0ac76e20d88a9b5c1
Yureka <yuka@yuka.dev> writes:
Thanks for looking into this.
On 11/29/25 17:35, Alyssa Ross wrote:
Sometimes we see a test failure that looks like this:
unexpected connection data: qemu-system-aarch64: terminating on signal 15 from pid 184 ()
I think this is because, now that the nc command in the networking integration test has a timeout, it's possible for it to time out after having opened the connection, but before having written all its input to it. Therefore, ignore connections that send a prefix of the expected data (including nothing), and just wait for the next connection rather than failing if that happens.
Closes: https://spectrum-os.org/lists/archives/spectrum-devel/875xbl33vw.fsf@alyssa.... Fixes: c61b297 ("release/checks/integration: add nc timeout") Signed-off-by: Alyssa Ross <hi@alyssa.is> --- Please test! It's difficult to know if I've solved the problem for real, since it happens transiently.
release/checks/integration/networking.c | 49 +++++++++++++++---------- 1 file changed, 30 insertions(+), 19 deletions(-)
diff --git a/release/checks/integration/networking.c b/release/checks/integration/networking.c index 97d7895..a445af1 100644 --- a/release/checks/integration/networking.c +++ b/release/checks/integration/networking.c @@ -57,32 +57,43 @@ static int setup_server(void) static void expect_connection(int listener) { - int conn_fd; - FILE *conn; + int conn, r; char msg[7]; size_t len; - fputs("waiting for server connection\n", stderr); - if ((conn_fd = accept(listener, nullptr, nullptr)) == -1) { - perror("accept"); - exit(EXIT_FAILURE); - } - fputs("accepted connection!\n", stderr); - if (!(conn = fdopen(conn_fd, "r"))) { - perror("fdopen(server connection)"); - exit(EXIT_FAILURE); - } + for (;;) { + len = 0; - len = fread(msg, 1, sizeof msg, conn); - if (len != 6 || memcmp("hello\n", msg, 6)) { - if (ferror(conn)) - perror("fread(server connection)"); - else + fputs("waiting for server connection\n", stderr); + if ((conn = accept(listener, nullptr, nullptr)) == -1) { + perror("accept"); + exit(EXIT_FAILURE); + } + fputs("accepted connection!\n", stderr); + + for (;;) { + r = read(conn, msg + len, sizeof msg - len); + if (r == -1) { + perror("read"); + exit(EXIT_FAILURE); + } + if (r == 0) + break; + len += r; + } + + if (memcmp("hello\n", msg, len) || len > 6) { fprintf(stderr, "unexpected connection data: %.*s", (int)len, msg); - exit(EXIT_FAILURE); + exit(EXIT_FAILURE); + } + + // If connection was disconnect partway through, try again. + if (len < 6) + continue; + + return; } - fclose(conn);
Is it intentional that the connection is never closed?
It is not, thanks!
participants (2)
-
Alyssa Ross -
Yureka