Fix silent hangs and unkillable deadlocks when the audio output is wedged#65
Open
alex19EP wants to merge 9 commits into
Open
Fix silent hangs and unkillable deadlocks when the audio output is wedged#65alex19EP wants to merge 9 commits into
alex19EP wants to merge 9 commits into
Conversation
When a flush is requested, the espeak thread called stop_speech() -> espeak_Cancel() with queue_guard held. espeak_Cancel() waits for espeak-ng's internal say thread to acknowledge the cancellation, and that thread can be blocked indefinitely inside a blocking ALSA call (snd_pcm_writei/snd_pcm_drain on a wedged device, as seen with EBUSY errors). In that case queue_guard was held forever, which in turn: - blocked the signal thread on pthread_mutex_lock(), so SIGINT/SIGTERM appeared to be ignored and only SIGKILL could end the process; - left the softsynth thread stuck in request_espeak_stop(), so /dev/softsynth was no longer drained, speakup's kernel buffer filled up, and console output (e.g. dmesg) stalled. Release queue_guard around the espeak_Cancel() call. This is safe because the only queue producer, the softsynth thread, is blocked waiting for stop_acknowledged for as long as stop_requested is set, so the queue cannot be mutated concurrently. Helps: linux-speakup#45 Helps: linux-speakup#62 Co-Authored-By: Claude <noreply@anthropic.com>
On SIGINT/SIGTERM, the signal thread only set should_run to 0 and relied on a wake-up chain to propagate the shutdown: the self-pipe wakes the softsynth thread out of select(), which on exit signals runner_awake to wake the espeak thread. That chain breaks whenever the softsynth thread is not sitting in select() but waiting on stop_acknowledged in request_espeak_stop(): nobody ever signals that condition variable on shutdown, so the thread never re-evaluates should_run and the process never exits. Broadcast all three condition variables after clearing should_run, so that every parked thread re-checks its predicate, whichever wait it is blocked in. Helps: linux-speakup#45 Helps: linux-speakup#62 Co-Authored-By: Claude <noreply@anthropic.com>
request_espeak_stop() waited forever for the espeak thread to acknowledge the stop. If espeak-ng is wedged inside the audio output (e.g. an ALSA device blocked or stuck returning EBUSY, as reported in issue linux-speakup#62), the acknowledgement never comes, and the softsynth thread stops draining /dev/softsynth forever. Speakup's kernel buffer then fills up and console output stalls, which matches the "blocks dmesg output after a couple of pages" observation in issue linux-speakup#45. The process goes silent and only SIGKILL gets rid of it. Wait at most 10 seconds for the acknowledgement (a normal cancellation takes milliseconds; the timeout can only trigger when espeak is truly stuck). On timeout, exit with a clear message so that the init system respawns espeakup in a clean state: our systemd unit already has Restart=always. A one-second restart beats an unkillable silent daemon, and was explicitly requested by the reporter of issue linux-speakup#62. Helps: linux-speakup#45 Helps: linux-speakup#62 Co-Authored-By: Claude <noreply@anthropic.com>
When processing an entry failed, only EE_BUFFER_FULL throttled before the retry; any other persistent error (e.g. EE_INTERNAL_ERROR after espeak was terminated) made queue_process_entry retry the same entry in a tight loop with no sleep, burning a whole CPU while printing to a stderr that points to /dev/null in daemon mode. Factor the one-second throttle out into espeak_wait_retry() and apply it to every failed entry. The wake_stop condition variable still interrupts the wait immediately when a flush comes in. Co-Authored-By: Claude <noreply@anthropic.com>
When resuming from CMD_PAUSE, queue_process_entry ignored the result of reinitialize_espeak: if espeak_Initialize failed, paused_espeak remained set, yet the entry was processed anyway, calling espeak_Synth & co on a terminated engine. Combined with the busy-retry loop, this produced an endless stream of failing calls against a dead engine. Make reinitialize_espeak report failure, and when espeak is unavailable, leave the entry queued and back off before trying to reinitialize again. Co-Authored-By: Claude <noreply@anthropic.com>
Issue linux-speakup#62 reports espeakup going permanently silent after libespeak-ng prints "error: Device or resource busy" (EBUSY from the ALSA device, via pcaudiolib). Once the audio output is wedged, espeak's internal command queue never drains, every espeak_Synth call fails with EE_BUFFER_FULL forever, and espeakup just kept retrying silently. EE_BUFFER_FULL is also perfectly normal while a long backlog is being played back, so persistent failure alone is not a reliable signal. To tell a backlogged engine from a wedged one, note progress whenever the synth callback fires (it is invoked for every chunk espeak synthesizes, and synthesis is paced by audio playback): if entries keep failing for ~10 seconds with no callback activity at all, declare the engine wedged. Recovery is layered: - restart the engine in-process (espeak_Cancel + espeak_Terminate + reinitialize), which recovers transient device problems; - if the engine has not been healthy for at least a minute between such restarts, after 3 restarts give up and exit, letting the init system (Restart=always in our systemd unit) respawn espeakup in a completely clean state; - if the restart itself blocks on the wedged device, the stop-acknowledgement timeout in the softsynth thread eventually terminates the process as a last resort. A quick espeak_Synth success right after a restart does not count as healthy on purpose: espeak's freshly emptied internal queue accepts entries even while the device is still wedged. Helps: linux-speakup#45 Helps: linux-speakup#62 Co-Authored-By: Claude <noreply@anthropic.com>
This was referenced Jun 11, 2026
There was a problem hiding this comment.
Pull request overview
This PR hardens espeakup against deadlocks and silent hangs when the underlying audio output (e.g., ALSA) wedges, by restructuring lock usage around espeak calls, adding bounded waits, and adding “wedged engine” detection/recovery logic so the daemon can recover or exit for a clean respawn.
Changes:
- Add a bounded (10s) wait for stop acknowledgement on softsynth flush, aborting with
_exit(3)if espeak is wedged. - On SIGINT/SIGTERM, broadcast all relevant condition variables so blocked threads re-check shutdown state.
- Refactor espeak thread behavior to avoid holding
queue_guardacross potentially blocking espeak calls, add retry backoff, and add engine wedge detection with restart/abort policy.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/softsynth.c | Adds a timed stop-ack wait on flush and aborts if the espeak thread never acknowledges. |
| src/signal.c | Broadcasts condition variables on termination signals to ensure waiters wake for shutdown. |
| src/espeak.c | Avoids lock-held blocking espeak calls; adds retry throttling and wedge detection with restart/exit behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
24
to
+28
| #include <stdio.h> | ||
| #include <stdlib.h> | ||
| #include <string.h> | ||
| #include <time.h> | ||
| #include <unistd.h> |
Comment on lines
73
to
76
| { | ||
| int i; | ||
| synth_progressed = 1; | ||
| for (i = 0; events[i].type != espeakEVENT_LIST_TERMINATED; i++) { |
Comment on lines
+391
to
+395
| if (synth_progressed) { | ||
| /* Espeak is making progress, it is merely backlogged. */ | ||
| synth_progressed = 0; | ||
| stalled_retries = 0; | ||
| } else if (++stalled_retries >= ESPEAK_STALL_RETRIES) { |
Comment on lines
+240
to
+244
| clock_gettime(CLOCK_REALTIME, &timeout); | ||
| timeout.tv_sec += stopAckTimeout; | ||
| while (should_run && stop_requested && err != ETIMEDOUT) | ||
| // wait for acknowledgement. | ||
| pthread_cond_wait(&stop_acknowledged, &queue_guard); | ||
| err = pthread_cond_timedwait(&stop_acknowledged, &queue_guard, |
Comment on lines
+66
to
+70
| * used to tell a merely backlogged engine from a wedged one. */ | ||
| static volatile int synth_progressed = 0; | ||
| static int stalled_retries = 0; | ||
| static int restart_attempts = 0; | ||
| static struct timespec last_restart; |
Member
Author
|
@sthibaul Hello, if you have time, can you lookat this please. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This addresses the freezes reported in #45 and #62: espeakup going permanently silent, ignoring SIGINT/SIGTERM (only SIGKILL works), and even stalling console output.
Root cause analysis
The trigger lives below espeakup (an ALSA device that wedges or gets stuck returning
EBUSY— theerror: Device or resource busymessages in #62 come from libespeak-ng/pcaudiolib, printed bydispatch_audio()), but espeakup's lock topology amplified a stuck audio call into a full process deadlock:snd_pcm_writei/snd_pcm_drainon a wedged device) and never checks its stop flag again.0x18), the softsynth thread callsrequest_espeak_stop()and waits forstop_acknowledgedwith no timeout.stop_speech()→espeak_Cancel()while holdingqueue_guard;espeak_Cancel()waits forever for the say thread → the mutex is held forever.pthread_mutex_lock(&queue_guard)→ signals appear ignored, only SIGKILL works (the "even systemd hangs restarting it" in Espeakup frequently goes silent claiming "error: Device or resource busy" #62)./dev/softsynth→ speakup's kernel buffer fills → console output stalls. That matches the "blocks dmesg output after a couple of pages" observation in espeakup 0.90+ random freezes #45 exactly.What this PR does
Each commit is self-contained, in increasing order of ambition:
espeak_Cancel()while holdingqueue_guard— breaks the deadlock at step 3; SIGTERM works again even when espeak-ng is stuck. Safe because the only queue producer is blocked waiting for the acknowledgement whilestop_requestedis set.should_run = 0now broadcasts all three condvars so threads parked in any wait re-check their predicate._exit(3)so the init system respawns espeakup (our systemd unit already hasRestart=always). A short restart beats an unkillable silent daemon, and was explicitly requested in Espeakup frequently goes silent claiming "error: Device or resource busy" #62.EE_BUFFER_FULLthrottled; any other persistent error busy-looped at 100% CPU.CMD_PAUSEignoredespeak_Initialize()failure and kept callingespeak_Synth()on a terminated engine.EE_BUFFER_FULLis normal while a long backlog plays, so persistent failure alone is not a wedge signal. Progress is tracked via the synth callback (invoked for every synthesized chunk, paced by playback): ~10 s of failures with zero callback activity → in-process engine restart (espeak_Cancel+espeak_Terminate+ reinitialize); 3 restarts without a healthy minute in between → exit for a clean respawn. Quick successes right after a restart deliberately do not count as healthy, since espeak's freshly emptied internal queue accepts entries even while the device is still wedged.The recovery is layered on purpose: if the in-process restart itself blocks on the wedged device, the stop-acknowledgement timeout still terminates the process on the next flush. All bail-out paths use
_exit(), since plainexit()could hang in library destructors while the device is wedged.Known leftover (intentional): on SIGTERM with a wedged engine,
main()can still block inpthread_join()of the espeak thread; systemd's stop timeout escalates to SIGKILL. The point of this PR is that espeakup no longer hangs silently and unkillably in normal operation.Testing
For those affected by #45/#62: please run this branch in the foreground (
espeakup --debug, or via the unit withStandardError=journal) and report:espeakup: espeak has been failing without making progress for 10 seconds, restarting itmessage,killall -9,espeakup: espeak did not acknowledge a stop request within 10 seconds, abortingmessages, which indicate espeak-ng was stuck inside ALSA — useful evidence for the driver-side investigation.AI Usage Disclosure