Skip to content

Fix silent hangs and unkillable deadlocks when the audio output is wedged#65

Open
alex19EP wants to merge 9 commits into
linux-speakup:masterfrom
alex19EP:fix-audio-wedge-hangs
Open

Fix silent hangs and unkillable deadlocks when the audio output is wedged#65
alex19EP wants to merge 9 commits into
linux-speakup:masterfrom
alex19EP:fix-audio-wedge-hangs

Conversation

@alex19EP

Copy link
Copy Markdown
Member

This addresses the freezes reported in #45 and #62: espeakup going permanently silent, ignoring SIGINT/SIGTERM (only SIGKILL works), and even stalling console output.

Root cause analysis

The trigger lives below espeakup (an ALSA device that wedges or gets stuck returning EBUSY — the error: Device or resource busy messages in #62 come from libespeak-ng/pcaudiolib, printed by dispatch_audio()), but espeakup's lock topology amplified a stuck audio call into a full process deadlock:

  1. espeak-ng's internal say thread blocks inside an ALSA call (snd_pcm_writei/snd_pcm_drain on a wedged device) and never checks its stop flag again.
  2. On the next flush (0x18), the softsynth thread calls request_espeak_stop() and waits for stop_acknowledged with no timeout.
  3. The espeak thread calls stop_speech()espeak_Cancel() while holding queue_guard; espeak_Cancel() waits forever for the say thread → the mutex is held forever.
  4. SIGINT/SIGTERM: the signal thread blocks on pthread_mutex_lock(&queue_guard) → signals appear ignored, only SIGKILL works (the "even systemd hangs restarting it" in Espeakup frequently goes silent claiming "error: Device or resource busy" #62).
  5. The softsynth thread never returns to reading /dev/softsynth → speakup's kernel buffer fills → console output stalls. That matches the "blocks dmesg output after a couple of pages" observation in espeakup 0.90+ random freezes #45 exactly.

What this PR does

Each commit is self-contained, in increasing order of ambition:

  • espeak: do not call espeak_Cancel() while holding queue_guard — breaks the deadlock at step 3; SIGTERM works again even when espeak-ng is stuck. Safe because the only queue producer is blocked waiting for the acknowledgement while stop_requested is set.
  • signal: wake all condition-variable waiters on shutdownshould_run = 0 now broadcasts all three condvars so threads parked in any wait re-check their predicate.
  • softsynth: bound the wait for espeak to acknowledge a stop request — wait at most 10 s (a normal cancel takes milliseconds), then print a clear message and _exit(3) so the init system respawns espeakup (our systemd unit already has Restart=always). A short restart beats an unkillable silent daemon, and was explicitly requested in Espeakup frequently goes silent claiming "error: Device or resource busy" #62.
  • espeak: back off before retrying after any espeak error — previously only EE_BUFFER_FULL throttled; any other persistent error busy-looped at 100% CPU.
  • espeak: do not call espeak functions after a failed reinitialization — resuming from CMD_PAUSE ignored espeak_Initialize() failure and kept calling espeak_Synth() on a terminated engine.
  • espeak: detect a wedged engine, restart it, and eventually give upEE_BUFFER_FULL is normal while a long backlog plays, so persistent failure alone is not a wedge signal. Progress is tracked via the synth callback (invoked for every synthesized chunk, paced by playback): ~10 s of failures with zero callback activity → in-process engine restart (espeak_Cancel + espeak_Terminate + reinitialize); 3 restarts without a healthy minute in between → exit for a clean respawn. Quick successes right after a restart deliberately do not count as healthy, since espeak's freshly emptied internal queue accepts entries even while the device is still wedged.

The recovery is layered on purpose: if the in-process restart itself blocks on the wedged device, the stop-acknowledgement timeout still terminates the process on the next flush. All bail-out paths use _exit(), since plain exit() could hang in library destructors while the device is wedged.

Known leftover (intentional): on SIGTERM with a wedged engine, main() can still block in pthread_join() of the espeak thread; systemd's stop timeout escalates to SIGKILL. The point of this PR is that espeakup no longer hangs silently and unkillably in normal operation.

Testing

For those affected by #45/#62: please run this branch in the foreground (espeakup --debug, or via the unit with StandardError=journal) and report:

  • whether speech recovers after the espeakup: espeak has been failing without making progress for 10 seconds, restarting it message,
  • whether, in the worst case, espeakup now exits (and gets respawned by systemd) instead of hanging until killall -9,
  • any espeakup: espeak did not acknowledge a stop request within 10 seconds, aborting messages, which indicate espeak-ng was stuck inside ALSA — useful evidence for the driver-side investigation.

AI Usage Disclosure

This change was developed with assistance from Claude (Anthropic). All code
was reviewed and tested by the author before submission.

alex19EP and others added 6 commits June 11, 2026 11:05
When a flush is requested, the espeak thread called stop_speech() ->
espeak_Cancel() with queue_guard held.  espeak_Cancel() waits for
espeak-ng's internal say thread to acknowledge the cancellation, and
that thread can be blocked indefinitely inside a blocking ALSA call
(snd_pcm_writei/snd_pcm_drain on a wedged device, as seen with EBUSY
errors).  In that case queue_guard was held forever, which in turn:

- blocked the signal thread on pthread_mutex_lock(), so SIGINT/SIGTERM
  appeared to be ignored and only SIGKILL could end the process;
- left the softsynth thread stuck in request_espeak_stop(), so
  /dev/softsynth was no longer drained, speakup's kernel buffer filled
  up, and console output (e.g. dmesg) stalled.

Release queue_guard around the espeak_Cancel() call.  This is safe
because the only queue producer, the softsynth thread, is blocked
waiting for stop_acknowledged for as long as stop_requested is set, so
the queue cannot be mutated concurrently.

Helps: linux-speakup#45
Helps: linux-speakup#62

Co-Authored-By: Claude <noreply@anthropic.com>
On SIGINT/SIGTERM, the signal thread only set should_run to 0 and
relied on a wake-up chain to propagate the shutdown: the self-pipe
wakes the softsynth thread out of select(), which on exit signals
runner_awake to wake the espeak thread.

That chain breaks whenever the softsynth thread is not sitting in
select() but waiting on stop_acknowledged in request_espeak_stop():
nobody ever signals that condition variable on shutdown, so the thread
never re-evaluates should_run and the process never exits.

Broadcast all three condition variables after clearing should_run, so
that every parked thread re-checks its predicate, whichever wait it is
blocked in.

Helps: linux-speakup#45
Helps: linux-speakup#62

Co-Authored-By: Claude <noreply@anthropic.com>
request_espeak_stop() waited forever for the espeak thread to
acknowledge the stop.  If espeak-ng is wedged inside the audio output
(e.g. an ALSA device blocked or stuck returning EBUSY, as reported in
issue linux-speakup#62), the acknowledgement never comes, and the softsynth thread
stops draining /dev/softsynth forever.  Speakup's kernel buffer then
fills up and console output stalls, which matches the "blocks dmesg
output after a couple of pages" observation in issue linux-speakup#45.  The process
goes silent and only SIGKILL gets rid of it.

Wait at most 10 seconds for the acknowledgement (a normal cancellation
takes milliseconds; the timeout can only trigger when espeak is truly
stuck).  On timeout, exit with a clear message so that the init system
respawns espeakup in a clean state: our systemd unit already has
Restart=always.  A one-second restart beats an unkillable silent
daemon, and was explicitly requested by the reporter of issue linux-speakup#62.

Helps: linux-speakup#45
Helps: linux-speakup#62

Co-Authored-By: Claude <noreply@anthropic.com>
When processing an entry failed, only EE_BUFFER_FULL throttled before
the retry; any other persistent error (e.g. EE_INTERNAL_ERROR after
espeak was terminated) made queue_process_entry retry the same entry
in a tight loop with no sleep, burning a whole CPU while printing to a
stderr that points to /dev/null in daemon mode.

Factor the one-second throttle out into espeak_wait_retry() and apply
it to every failed entry.  The wake_stop condition variable still
interrupts the wait immediately when a flush comes in.

Co-Authored-By: Claude <noreply@anthropic.com>
When resuming from CMD_PAUSE, queue_process_entry ignored the result
of reinitialize_espeak: if espeak_Initialize failed, paused_espeak
remained set, yet the entry was processed anyway, calling
espeak_Synth & co on a terminated engine.  Combined with the
busy-retry loop, this produced an endless stream of failing calls
against a dead engine.

Make reinitialize_espeak report failure, and when espeak is
unavailable, leave the entry queued and back off before trying to
reinitialize again.

Co-Authored-By: Claude <noreply@anthropic.com>
Issue linux-speakup#62 reports espeakup going permanently silent after libespeak-ng
prints "error: Device or resource busy" (EBUSY from the ALSA device,
via pcaudiolib).  Once the audio output is wedged, espeak's internal
command queue never drains, every espeak_Synth call fails with
EE_BUFFER_FULL forever, and espeakup just kept retrying silently.

EE_BUFFER_FULL is also perfectly normal while a long backlog is being
played back, so persistent failure alone is not a reliable signal.  To
tell a backlogged engine from a wedged one, note progress whenever the
synth callback fires (it is invoked for every chunk espeak
synthesizes, and synthesis is paced by audio playback): if entries
keep failing for ~10 seconds with no callback activity at all, declare
the engine wedged.

Recovery is layered:
- restart the engine in-process (espeak_Cancel + espeak_Terminate +
  reinitialize), which recovers transient device problems;
- if the engine has not been healthy for at least a minute between
  such restarts, after 3 restarts give up and exit, letting the init
  system (Restart=always in our systemd unit) respawn espeakup in a
  completely clean state;
- if the restart itself blocks on the wedged device, the
  stop-acknowledgement timeout in the softsynth thread eventually
  terminates the process as a last resort.

A quick espeak_Synth success right after a restart does not count as
healthy on purpose: espeak's freshly emptied internal queue accepts
entries even while the device is still wedged.

Helps: linux-speakup#45
Helps: linux-speakup#62

Co-Authored-By: Claude <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens espeakup against deadlocks and silent hangs when the underlying audio output (e.g., ALSA) wedges, by restructuring lock usage around espeak calls, adding bounded waits, and adding “wedged engine” detection/recovery logic so the daemon can recover or exit for a clean respawn.

Changes:

  • Add a bounded (10s) wait for stop acknowledgement on softsynth flush, aborting with _exit(3) if espeak is wedged.
  • On SIGINT/SIGTERM, broadcast all relevant condition variables so blocked threads re-check shutdown state.
  • Refactor espeak thread behavior to avoid holding queue_guard across potentially blocking espeak calls, add retry backoff, and add engine wedge detection with restart/abort policy.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/softsynth.c Adds a timed stop-ack wait on flush and aborts if the espeak thread never acknowledges.
src/signal.c Broadcasts condition variables on termination signals to ensure waiters wake for shutdown.
src/espeak.c Avoids lock-held blocking espeak calls; adds retry throttling and wedge detection with restart/exit behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/espeak.c
Comment on lines 24 to +28
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>
Comment thread src/espeak.c
Comment on lines 73 to 76
{
int i;
synth_progressed = 1;
for (i = 0; events[i].type != espeakEVENT_LIST_TERMINATED; i++) {
Comment thread src/espeak.c Outdated
Comment on lines +391 to +395
if (synth_progressed) {
/* Espeak is making progress, it is merely backlogged. */
synth_progressed = 0;
stalled_retries = 0;
} else if (++stalled_retries >= ESPEAK_STALL_RETRIES) {
Comment thread src/softsynth.c Outdated
Comment on lines +240 to +244
clock_gettime(CLOCK_REALTIME, &timeout);
timeout.tv_sec += stopAckTimeout;
while (should_run && stop_requested && err != ETIMEDOUT)
// wait for acknowledgement.
pthread_cond_wait(&stop_acknowledged, &queue_guard);
err = pthread_cond_timedwait(&stop_acknowledged, &queue_guard,
Comment thread src/espeak.c Outdated
Comment on lines +66 to +70
* used to tell a merely backlogged engine from a wedged one. */
static volatile int synth_progressed = 0;
static int stalled_retries = 0;
static int restart_attempts = 0;
static struct timespec last_restart;
@alex19EP

Copy link
Copy Markdown
Member Author

@sthibaul Hello, if you have time, can you lookat this please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants