Description
When a VMM process (e.g. Firecracker, Qemu, HVT) terminates, it enters a zombie (defunct) state until its parent process (the containerd shim) reaps it by calling waitpid.
During forced cleanup or stop paths (like urunc delete --force), urunc executes killProcess(pid) to terminate the process:
- killProcess sends SIGKILL to the VMM PID.
- The VMM process terminates and becomes a zombie.
- killProcess enters a loop polling unix.Kill(pid, 0) to check if the process is dead, waiting for it to return ESRCH.
- However, unix.Kill(pid, 0) returns nil (success) for zombie processes since they still exist in the process table.
- The parent shim is blocked synchronously waiting for urunc to exit, meaning it cannot process the SIGCHLD and reap the zombie VMM child.
This creates a deadlock: the zombie cannot be reaped until urunc exits, and urunc cannot exit because it is waiting for the zombie to disappear. killProcess eventually times out after 2 seconds and returns an error, causing the command to fail.
Steps to Reproduce
- Run a urunc container.
- Force delete the container:
urunc delete --force <container-id>
- Observe that the command blocks for 2 seconds and fails with timeout waiting for pid to die.
Expected Behavior
If the VMM process has already terminated (even if it is a zombie), killProcess and isRunning() should recognize it as dead/stopped immediately, rather than timing out or preventing deletion.
Suggested Fix
Read /proc//stat to check the process state. If the state is zombie (Z) or dead (X/x), treat it as terminated immediately:
func isZombieOrDead(pid int) (bool, error) {
if err := unix.Kill(pid, 0); err != nil {
if errors.Is(err, unix.ESRCH) {
return true, nil
}
return false, err
}
data, err := os.ReadFile(fmt.Sprintf("/proc/%d/stat", pid))
if err != nil {
if errors.Is(err, os.ErrNotExist) {
return true, nil
}
return false, nil // Fallback if /proc is not mounted
}
idx := strings.LastIndexByte(string(data), ')')
if idx == -1 || idx+2 >= len(data) {
return false, fmt.Errorf("invalid stat format")
}
state := data[idx+2]
return state == 'Z' || state == 'X' || state == 'x', nil
}
Use this check inside killProcess and isRunning() to detect terminated processes immediately.
Description
When a VMM process (e.g. Firecracker, Qemu, HVT) terminates, it enters a zombie (defunct) state until its parent process (the containerd shim) reaps it by calling waitpid.
During forced cleanup or stop paths (like
urunc delete --force), urunc executeskillProcess(pid)to terminate the process:This creates a deadlock: the zombie cannot be reaped until urunc exits, and urunc cannot exit because it is waiting for the zombie to disappear. killProcess eventually times out after 2 seconds and returns an error, causing the command to fail.
Steps to Reproduce
Expected Behavior
If the VMM process has already terminated (even if it is a zombie), killProcess and isRunning() should recognize it as dead/stopped immediately, rather than timing out or preventing deletion.
Suggested Fix
Read /proc//stat to check the process state. If the state is zombie (Z) or dead (X/x), treat it as terminated immediately:
Use this check inside killProcess and isRunning() to detect terminated processes immediately.