On Sun Jun 14, 2020 at 11:43 AM, Alyssa Ross wrote:
If a DiskCommand was received on the crosvm socket before the virtio-block device was activated, the Token::VmRequest case in the main event loop would forward the request to the block device socket, and then wait syncronously for a response. That response would never come because the device hadn't been activated, and it would never be activated because the event loop would never continue, and therefore never be able to respond to the event that causes the device to be activated. crosvm would therefore just hang forever, waiting for a response that would never come.
This patch fixes this deadlock by keeping track of whether devices that send a response in this way have been activated yet. If they have already been activated, messages are sent and responses are received as normal. If they have not been activated, messages are instead put into a per-device queue. Once the device is activated, queued messages are processed all at once, and then the device is marked as ready, and the queue is dropped. Future messages are processed immediately as they come in, with no further queueing.
Sounds like quite the "chicken or the egg"-type problem.
A device indicates that it is ready by sending a message on its socket. The main crosvm event loop can then poll the socket, to be notified when the device is ready. This poll event will only trigger once -- once it has been received, it is removed from the poll context.
Currently, the only device type that responds to external control messages AND needs to be activated by an event is the block device. The balloon device does not respond to messages, and the xhci controller device is activated up front. The code is nevertheless structured so that it should be very easy to drop another kind of device in to the queuing system, should that be required.
+1 for modularity.
--- devices/src/virtio/block.rs | 5 + src/linux.rs | 201 ++++++++++++++++++++++++++++++------ vm_control/src/lib.rs | 57 +++++++--- 3 files changed, 213 insertions(+), 50 deletions(-)
diff --git a/src/linux.rs b/src/linux.rs index ec2067cf..ccd7d88c 100644 --- a/src/linux.rs +++ b/src/linux.rs
----->8-----
@@ -1667,6 +1668,45 @@ fn file_to_i64<P: AsRef<Path>>(path: P) -> io::Result<i64> { .ok_or_else(|| io::Error::new(io::ErrorKind::InvalidData, "empty file")) }
+/// Returns a boolean indicating whether the VM should be exited. +fn do_vm_request( + request: VmRequest, + device_socket: Option<&UnixSeqpacket>, + control_socket: &VmControlResponseSocket, + run_mode_arc: &Arc<VcpuRunMode>, + vcpu_handles: &mut Vec<JoinHandle<()>>, + io_bus: &mut Bus, +) -> MsgResult<bool> { + let mut run_mode_opt = None; + let response = request.execute(&mut run_mode_opt, device_socket); + control_socket.send(&response)?; + if let Some(run_mode) = run_mode_opt { + info!("control socket changed run mode to {}", run_mode); + match run_mode { + VmRunMode::Exiting => Ok(true), + VmRunMode::Running => { + if let VmRunMode::Suspending = *run_mode_arc.mtx.lock() { + io_bus.notify_resume(); + } + run_mode_arc.set_and_notify(VmRunMode::Running); + for handle in vcpu_handles { + let _ = handle.kill(SIGRTMIN() + 0);
I know this is essentially just moved (and probably isn't even something you wrote), but do you know why this is `+ 0`? Does this somehow coerce to the desired type or something? Maybe I'm overlooking something obvious here.
----->8-----
Either way, the above is just a nit. Good work on tracking this particular issue down! Reviewed-by: Cole Helbling <cole.e.helbling@outlook.com>