Fix Process.waitall/detach under URing selector when no children remain#201
Merged
Merged
Conversation
The URing selector's process_wait hook (io_uring waitid path) raised via rb_syserr_fail when waitid reported an error such as ECHILD. But Process.waitall and Process.detach rely on rb_waitpid *returning* -1 with errno (so they can terminate their loops on ECHILD), not raising. Under a fiber scheduler, that goes through the hook, so raising broke them: Process.waitall would propagate Errno::ECHILD instead of returning its collected results. Route waitid errors through the same WNOHANG reap used on success, which reproduces the error as a Process::Status carrier (pid -1, errno) exactly like Process::Status.wait. Process.wait(-1) with no children still raises ECHILD, as before (and as without a scheduler). The threaded fallback (EPoll/KQueue/Select) was already correct since it goes through Process::Status.wait. Add integration tests (via TestScheduler) for Process.waitall across all selectors, and for the no-children-raises-ECHILD behaviour. Assisted-By: devx/98676a79-085f-4593-a4f4-7d09ea29b5ff
b98133e to
e29fd98
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes
Process.waitall/Process.detachunder theURingselector when there are no more children to reap.The bug
The
URingselector'sprocess_waithook (theio_uringwaitidpath) raised viarb_syserr_failwhenwaitidreported an error such asECHILD:But
Process.waitall(andProcess.detach) rely onrb_waitpidreturning-1witherrno == ECHILDso they can terminate their loops — they do not expect it to raise (seeproc_waitallin Ruby'sprocess.c). Under a fiber scheduler this path goes through the hook, so raising broke them:Process.waitallwould propagateErrno::ECHILDinstead of returning the statuses it had already collected.Process.wait(-1)with no children still raisesErrno::ECHILD— that's correct and unchanged (it matches non-scheduler behaviour).The threaded fallback (EPoll / KQueue / Select) was already correct, because it reaps via
Process::Status.wait, which returns the proper(-1, errno)carrier rather than raising.The fix
Route
waitiderrors through the sameWNOHANGreap already used on the success path.rb_process_status_wait(pid, flags | WNOHANG)reproduces the failure as aProcess::Statuscarrying(pid -1, errno), exactly likeProcess::Status.wait— sorb_waitpidreports it correctly andwaitall/detachbehave as expected. No new API required.Tests
Adds integration tests (via
TestScheduler, across all selectors):Process.waitallcollects all child statuses and terminates cleanly.Process.wait(-1)with no children raisesErrno::ECHILD(regression guard for the corrected behaviour).Verification
waitalltest errors withErrno::ECHILDonly on theURingwaitid path (EPoll/Select pass).Background
This came out of the discussion on bug #21704 about exposing
rb_process_status_new. Notably this fix needs no new Ruby API —rb_process_status_waitalready provides the full-fidelity reap (including the error carrier). A constructor likerb_process_status_newwould only be needed later to avoid the extra reap syscall on the success path.