Skip to content

Improve voice activation, add va status skil and improve hud rendering#405

Open
SawPsyder wants to merge 4 commits into
ShipBit:developfrom
SawPsyder:feature/improve-va-mic-status
Open

Improve voice activation, add va status skil and improve hud rendering#405
SawPsyder wants to merge 4 commits into
ShipBit:developfrom
SawPsyder:feature/improve-va-mic-status

Conversation

@SawPsyder

@SawPsyder SawPsyder commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Also references #402

Summary

Improve voice activation, add va status skil and improve hud rendering

Changes

  • Fix triggered voice activation state change during audio player playback (saves future state instead of changing anything directly)
  • reuse background opacity for hud outline rendering
  • add new skill that shows the voice activation status on the hud (this is also a community request)

Testing

  • tested locally
  • enable voice activation and change state during playback or generally to investigate shown state in wingman ai and the new skill

Checklist

  • This PR is linked to a GitHub issue
  • I have tested my changes locally
  • I have rebased my branch onto develop

@SawPsyder SawPsyder requested a review from Shackless July 3, 2026 16:20

@Shackless Shackless left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The core idea of the toggle fix is sound — I verified that all real end-of-playback paths reliably fire on_playback_finished, and streamed responses produce exactly one started/finished cycle, so the saved intent is applied correctly in the normal flow. The outline-alpha change also checks out (content-layer borders staying opaque is consistent with the design).

I left inline comments on the issues found during review, roughly in order of severity:

  1. Race between the new playback branch and on_playback_started that can silently swallow a mute press (wingman_core.py)
  2. The skill's permanent /ws connection breaks Core's offline-message queueing for the real client (mic_status/main.py)
  3. The intent fix only covers the hotkey path — the /voice-activation/mute endpoint still has the old bug (wingman_core.py)
  4. Icon paths containing ) never render, and the code comment about the parser is wrong (mic_status/main.py)
  5. After a HUD server restart the icon comes back as a default-styled window, and failed draws are cached as successful (mic_status/main.py)
  6. The connected guard in update_config drops config changes after any transient HUD hiccup (mic_status/main.py)
  7. Reading Core internals via sys.modules["__main__"] bypasses the skill facade — uniquely among bundled skills (mic_status/main.py)
  8. icon_size code default (96) drifted from default_config.yaml (72) (mic_status/main.py)

Minor nit not worth its own thread: (55, 62, 74) is now hardcoded in five places across hud_server; a DEFAULT_BORDER_COLOR in hud_server/constants.py would fit the existing pattern there.

Comment thread wingman_core.py
and self.audio_player.is_playing
and self.settings_service.settings.voice_activation.enabled
):
self.was_listening_before_playback = not self.was_listening_before_playback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race with on_playback_started — a mute press can be silently swallowed.

audio_player sets is_playing = True before on_playback_started runs and captures was_listening_before_playback (audio_player.py:227-230wingman_core.py:1761, and on_playback_started even awaits printr.print_async first). The hotkey toggle runs on the keyboard-listener thread, so it can enter this branch in that gap: it then flips the stale flag left over from the previous playback and broadcasts it — and moments later on_playback_started overwrites the flag with the current is_listening. The user's toggle is lost and the broadcast mute state briefly contradicts reality.

The window is narrow but real. Capturing the intent in the same place is_playing is set (or having on_playback_started not overwrite a flag that was just user-modified) would close it.

Comment thread wingman_core.py
# act on the transient muted state (which would start the recognizer
# mid-playback and/or be overridden when playback ends). on_playback_finished
# restores from was_listening_before_playback, so updating that is enough.
if (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent fix only covers the hotkey path — the /voice-activation/mute endpoint still has the old bug.

The client's mute button POSTs to an endpoint bound directly to start_voice_recognition (wingman_core.py:127-131), which bypasses this new branch entirely. A mute set that way during playback is still overridden by on_playback_finished — the exact bug this PR fixes for the hotkey. Today the client disables its toggle while audio plays (MuteToggle.svelte), so this mostly surfaces as hotkey-vs-GUI divergence (hotkey now works during playback, GUI can't) plus an exposed API inconsistency for anyone calling the endpoint directly.

A deeper fix would model the user's intent as a single field that all entry points mutate, and derive the actual recognizer state from intent AND NOT is_playing — then this per-caller special case (and the duplicated VoiceActivationMutedCommand broadcast) disappears.

Comment thread skills/mic_status/main.py
self._mic_listening = self._seed_listening()
await self._refresh()
try:
async with websockets.connect(ws_url, open_timeout=3) as ws:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill's permanent /ws connection breaks Core's offline-message queueing for the real client.

ConnectionManager.broadcast only queues messages for later delivery when active_connections is empty, and a connection counts as active immediately on accept (main.py:270-272) — client_ready only triggers the flush. With this skill always connected, broadcasts sent while the GUI is closed or restarting are delivered only to the skill and never queued, so toasts/errors that pre-PR would be flushed on reconnect are permanently lost.

Since the skill runs inside the Core process and already polls in-process state every 250 ms, the WS client looks unnecessary altogether: calling _seed_listening() on the poll tick yields the same data as the voice_activation_muted broadcast. Dropping it would also delete _core_port, the 49111 fallback (which silently breaks under a non-default --port anyway), the reconnect machinery, and the websockets>=13.1 requirements.txt entry.

Comment thread skills/mic_status/main.py

skill_dir = os.path.dirname(os.path.abspath(__file__))
# Forward slashes so the paths are safe inside Markdown image syntax (spaces are
# fine - the HUD image parser reads everything up to the closing paren).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paths containing ) never render — and this comment is factually wrong.

The HUD image parser (hud_server/rendering/markdown.py:456) uses !\[([^]]*)]\(([^)]+)\), which stops at the first ), not the closing one. An install path or avatar directory containing a paren (e.g. .../Wingman (Beta)/...) truncates the URL, the file load fails, and the icon silently never appears — with the leftover ...) rendering as stray text.

Either harden the parser or fix this comment to document the real constraint; the composited rec-image filename is sanitized, but the directory portion is not.

Comment thread skills/mic_status/main.py
img = self._mic_on_img if effective else self._mic_off_img
if img == self._rendered_img:
return
self._rendered_img = img

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a HUD server restart the icon comes back as a default-styled window, and failed draws are cached as successful.

Two related problems:

  1. create_group with the configured props runs exactly once in _run(). If the HUD server restarts, the group is gone, and the next add_item auto-creates it server-side with default props (hud_manager.py:452-454) — default background, width, and position instead of the configured icon, permanently until skill reload.
  2. self._rendered_img = img is set before add_item is awaited, and the return value is ignored (HudHttpClient._request returns None on any error). A failed draw is recorded as rendered, so the img == self._rendered_img early-return suppresses retries until the next state change — the icon shows the wrong state in the meantime.

The bundled HUD skill already solved this with its _ensure_connected pattern that re-creates groups with props on reconnect (skills/hud/main.py:569-584) — worth reusing here. For (2), only cache _rendered_img after a successful add_item.

Comment thread skills/mic_status/main.py
await super().update_config(new_config)
if (old_config.custom_properties or []) == (new_config.custom_properties or []):
return
if not self._client or not self._client.connected:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connected guard deterministically drops config changes after any transient HUD hiccup.

Any single failed/timed-out HUD request flips _connected to False (http_client.py:229/233/243/251), and nothing in this skill ever resets it — the _run reconnect loop only re-dials the Core WebSocket, not the HUD client. But HudHttpClient._request auto-reconnects on the next call anyway (http_client.py:180-183), so the guard only serves to skip the delete/create/refresh: the new placement/size is stored by super().update_config but never applied until the skill is reloaded, with no error shown.

Dropping the .connected check (keeping the self._client None-check) fixes it.

Comment thread skills/mic_status/main.py
def _core_port(self) -> int:
"""Resolve the port Core is listening on (set by main.py at launch), falling
back to the documented default when it can't be read."""
main_mod = sys.modules.get("__main__")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading Core internals via sys.modules["__main__"] bypasses the skill facade — uniquely among bundled skills.

No other bundled skill touches __main__ (I checked), and core.is_listening / core.active_recording / main.port have no facade equivalent — so any refactor of main.py's module-level globals silently degrades this skill to fallbacks with no error anywhere.

Facade alternatives already exist for part of this:

  • self.wingman.audio.on_playback_started / on_playback_finished subscriptions (wingmen/facade.py:738-749, async-safe PubSub with .unsubscribe()) instead of polling is_playing at 4 Hz
  • WingmanContext.avatar_path (wingmen/wingman_context.py:100) instead of getattr(wingman, "get_avatar_path") on raw Wingman objects

For listening/recording state, exposing it through the facade (or a Core broadcast for active_recording changes — Core mutates it in only four places) would be the right layer, rather than setting the precedent that skills can grope process globals.

Comment thread skills/mic_status/main.py
"""Build the window props from the skill config (placement, position, size)."""
common = dict(
priority=100,
width=int(self._get_prop("icon_size", 96)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default drift: code says 96, default_config.yaml says 72.

The yaml normally supplies the value, but if the property is missing from a saved config the icon renders at 96px instead of the documented 72. Same duplication risk applies to the other _get_prop fallbacks (layout_mode, anchor, pos_x, pos_y) — worth making them match the yaml, or treating the yaml as the single source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants