Improve voice activation, add va status skil and improve hud rendering#405
Improve voice activation, add va status skil and improve hud rendering#405SawPsyder wants to merge 4 commits into
Conversation
Shackless
left a comment
There was a problem hiding this comment.
Thanks for the PR! The core idea of the toggle fix is sound — I verified that all real end-of-playback paths reliably fire on_playback_finished, and streamed responses produce exactly one started/finished cycle, so the saved intent is applied correctly in the normal flow. The outline-alpha change also checks out (content-layer borders staying opaque is consistent with the design).
I left inline comments on the issues found during review, roughly in order of severity:
- Race between the new playback branch and
on_playback_startedthat can silently swallow a mute press (wingman_core.py) - The skill's permanent
/wsconnection breaks Core's offline-message queueing for the real client (mic_status/main.py) - The intent fix only covers the hotkey path — the
/voice-activation/muteendpoint still has the old bug (wingman_core.py) - Icon paths containing
)never render, and the code comment about the parser is wrong (mic_status/main.py) - After a HUD server restart the icon comes back as a default-styled window, and failed draws are cached as successful (mic_status/main.py)
- The
connectedguard inupdate_configdrops config changes after any transient HUD hiccup (mic_status/main.py) - Reading Core internals via
sys.modules["__main__"]bypasses the skill facade — uniquely among bundled skills (mic_status/main.py) icon_sizecode default (96) drifted from default_config.yaml (72) (mic_status/main.py)
Minor nit not worth its own thread: (55, 62, 74) is now hardcoded in five places across hud_server; a DEFAULT_BORDER_COLOR in hud_server/constants.py would fit the existing pattern there.
| and self.audio_player.is_playing | ||
| and self.settings_service.settings.voice_activation.enabled | ||
| ): | ||
| self.was_listening_before_playback = not self.was_listening_before_playback |
There was a problem hiding this comment.
Race with on_playback_started — a mute press can be silently swallowed.
audio_player sets is_playing = True before on_playback_started runs and captures was_listening_before_playback (audio_player.py:227-230 → wingman_core.py:1761, and on_playback_started even awaits printr.print_async first). The hotkey toggle runs on the keyboard-listener thread, so it can enter this branch in that gap: it then flips the stale flag left over from the previous playback and broadcasts it — and moments later on_playback_started overwrites the flag with the current is_listening. The user's toggle is lost and the broadcast mute state briefly contradicts reality.
The window is narrow but real. Capturing the intent in the same place is_playing is set (or having on_playback_started not overwrite a flag that was just user-modified) would close it.
| # act on the transient muted state (which would start the recognizer | ||
| # mid-playback and/or be overridden when playback ends). on_playback_finished | ||
| # restores from was_listening_before_playback, so updating that is enough. | ||
| if ( |
There was a problem hiding this comment.
The intent fix only covers the hotkey path — the /voice-activation/mute endpoint still has the old bug.
The client's mute button POSTs to an endpoint bound directly to start_voice_recognition (wingman_core.py:127-131), which bypasses this new branch entirely. A mute set that way during playback is still overridden by on_playback_finished — the exact bug this PR fixes for the hotkey. Today the client disables its toggle while audio plays (MuteToggle.svelte), so this mostly surfaces as hotkey-vs-GUI divergence (hotkey now works during playback, GUI can't) plus an exposed API inconsistency for anyone calling the endpoint directly.
A deeper fix would model the user's intent as a single field that all entry points mutate, and derive the actual recognizer state from intent AND NOT is_playing — then this per-caller special case (and the duplicated VoiceActivationMutedCommand broadcast) disappears.
| self._mic_listening = self._seed_listening() | ||
| await self._refresh() | ||
| try: | ||
| async with websockets.connect(ws_url, open_timeout=3) as ws: |
There was a problem hiding this comment.
The skill's permanent /ws connection breaks Core's offline-message queueing for the real client.
ConnectionManager.broadcast only queues messages for later delivery when active_connections is empty, and a connection counts as active immediately on accept (main.py:270-272) — client_ready only triggers the flush. With this skill always connected, broadcasts sent while the GUI is closed or restarting are delivered only to the skill and never queued, so toasts/errors that pre-PR would be flushed on reconnect are permanently lost.
Since the skill runs inside the Core process and already polls in-process state every 250 ms, the WS client looks unnecessary altogether: calling _seed_listening() on the poll tick yields the same data as the voice_activation_muted broadcast. Dropping it would also delete _core_port, the 49111 fallback (which silently breaks under a non-default --port anyway), the reconnect machinery, and the websockets>=13.1 requirements.txt entry.
|
|
||
| skill_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| # Forward slashes so the paths are safe inside Markdown image syntax (spaces are | ||
| # fine - the HUD image parser reads everything up to the closing paren). |
There was a problem hiding this comment.
Paths containing ) never render — and this comment is factually wrong.
The HUD image parser (hud_server/rendering/markdown.py:456) uses !\[([^]]*)]\(([^)]+)\), which stops at the first ), not the closing one. An install path or avatar directory containing a paren (e.g. .../Wingman (Beta)/...) truncates the URL, the file load fails, and the icon silently never appears — with the leftover ...) rendering as stray text.
Either harden the parser or fix this comment to document the real constraint; the composited rec-image filename is sanitized, but the directory portion is not.
| img = self._mic_on_img if effective else self._mic_off_img | ||
| if img == self._rendered_img: | ||
| return | ||
| self._rendered_img = img |
There was a problem hiding this comment.
After a HUD server restart the icon comes back as a default-styled window, and failed draws are cached as successful.
Two related problems:
create_groupwith the configured props runs exactly once in_run(). If the HUD server restarts, the group is gone, and the nextadd_itemauto-creates it server-side with default props (hud_manager.py:452-454) — default background, width, and position instead of the configured icon, permanently until skill reload.self._rendered_img = imgis set beforeadd_itemis awaited, and the return value is ignored (HudHttpClient._requestreturnsNoneon any error). A failed draw is recorded as rendered, so theimg == self._rendered_imgearly-return suppresses retries until the next state change — the icon shows the wrong state in the meantime.
The bundled HUD skill already solved this with its _ensure_connected pattern that re-creates groups with props on reconnect (skills/hud/main.py:569-584) — worth reusing here. For (2), only cache _rendered_img after a successful add_item.
| await super().update_config(new_config) | ||
| if (old_config.custom_properties or []) == (new_config.custom_properties or []): | ||
| return | ||
| if not self._client or not self._client.connected: |
There was a problem hiding this comment.
This connected guard deterministically drops config changes after any transient HUD hiccup.
Any single failed/timed-out HUD request flips _connected to False (http_client.py:229/233/243/251), and nothing in this skill ever resets it — the _run reconnect loop only re-dials the Core WebSocket, not the HUD client. But HudHttpClient._request auto-reconnects on the next call anyway (http_client.py:180-183), so the guard only serves to skip the delete/create/refresh: the new placement/size is stored by super().update_config but never applied until the skill is reloaded, with no error shown.
Dropping the .connected check (keeping the self._client None-check) fixes it.
| def _core_port(self) -> int: | ||
| """Resolve the port Core is listening on (set by main.py at launch), falling | ||
| back to the documented default when it can't be read.""" | ||
| main_mod = sys.modules.get("__main__") |
There was a problem hiding this comment.
Reading Core internals via sys.modules["__main__"] bypasses the skill facade — uniquely among bundled skills.
No other bundled skill touches __main__ (I checked), and core.is_listening / core.active_recording / main.port have no facade equivalent — so any refactor of main.py's module-level globals silently degrades this skill to fallbacks with no error anywhere.
Facade alternatives already exist for part of this:
self.wingman.audio.on_playback_started/on_playback_finishedsubscriptions (wingmen/facade.py:738-749, async-safe PubSub with.unsubscribe()) instead of pollingis_playingat 4 HzWingmanContext.avatar_path(wingmen/wingman_context.py:100) instead ofgetattr(wingman, "get_avatar_path")on raw Wingman objects
For listening/recording state, exposing it through the facade (or a Core broadcast for active_recording changes — Core mutates it in only four places) would be the right layer, rather than setting the precedent that skills can grope process globals.
| """Build the window props from the skill config (placement, position, size).""" | ||
| common = dict( | ||
| priority=100, | ||
| width=int(self._get_prop("icon_size", 96)), |
There was a problem hiding this comment.
Default drift: code says 96, default_config.yaml says 72.
The yaml normally supplies the value, but if the property is missing from a saved config the icon renders at 96px instead of the documented 72. Same duplication risk applies to the other _get_prop fallbacks (layout_mode, anchor, pos_x, pos_y) — worth making them match the yaml, or treating the yaml as the single source of truth.
Also references #402
Summary
Improve voice activation, add va status skil and improve hud rendering
Changes
Testing
Checklist
develop