Fix: send AI enhancement instructions in system role, not user message#439
Conversation
…ge (altic-dev#388) When building the LLM request for dictation enhancement, DictationPostProcessingService hardcoded `systemPrompt = ""` and folded the instruction text into the user message via renderDictationUserMessage. This caused providers that treat role separation strictly (e.g. Cerebras gpt-oss-120b) to answer the transcript as a question rather than apply the instructions as a directive. Fix: assign the resolved prompt text to `systemPrompt` directly and set `userMessageContent` to the raw transcript only. The messages builder already guards on `systemPrompt.isEmpty` before appending the system entry, so the no-prompt case (prompt off) is unaffected. Four unit tests added in DictationSystemPromptTests covering: - effectiveDictationSystemPrompt returns the configured instruction text - effectiveDictationSystemPrompt falls back to a non-empty built-in default - renderDictationUserMessage with empty prompt returns transcript only - renderDictationUserMessage ${transcript} placeholder substitution still works
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ded39d9a9b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ev#388) DictationPostProcessingService (fixed in the previous commit) is only called from the local API endpoint. The actual user-facing dictation path goes through ContentView.processTextWithAI, which had its own independent prompt assembly that folded instructions + transcript into a single user turn via renderDictationUserMessage. This aligned the ContentView path with the correct behaviour: prompt goes in the system role, transcript (or user input) is the sole user turn. Both isDictationCall and non-dictation branches now do the same thing, so the conditional is removed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e92e4e75d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let systemPrompt = promptText | ||
| let userMessageContent = inputText |
There was a problem hiding this comment.
Preserve ${transcript} substitution in dictation prompts
For any saved dictation prompt that uses the documented ${transcript} placeholder, this now sends the raw prompt text (including the literal placeholder) as the system message and the transcript as a separate user message. I checked repo-wide callers: after this diff renderDictationUserMessage is only used by the new tests, so placeholder substitution is no longer performed in either this hotkey path or DictationPostProcessingService; the added placeholder test therefore exercises dead code rather than the request sent to providers.
Useful? React with 👍 / 👎.
Add SettingsStore.renderSystemPrompt(promptText:transcript:) which
substitutes the \${transcript} placeholder when present and returns the
prompt unchanged otherwise. Call it in ContentView.processTextWithAI and
DictationPostProcessingService before assigning the system prompt, so
saved prompt templates that reference the placeholder continue to work
under the new two-role message layout.
Update DictationSystemPromptTests to test renderSystemPrompt (the
production path) instead of the now-production-dead renderDictationUserMessage.
Fixes the P2 raised in review of altic-dev#439.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a7a0d1c18e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| /// the transcript is always sent as a separate user turn. | ||
| static func renderSystemPrompt(promptText: String, transcript: String) -> String { | ||
| guard promptText.contains(self.transcriptPlaceholder) else { return promptText } | ||
| return promptText.replacingOccurrences(of: self.transcriptPlaceholder, with: transcript) |
There was a problem hiding this comment.
Avoid duplicating placeholder transcripts
For dictation prompts that contain ${transcript}, this replacement puts the transcript into the system prompt, while both new call sites still send the same text as the user turn (ContentView sets userMessageContent = inputText, and DictationPostProcessingService sets it to trimmed). Those prompts used to include the transcript only once via renderDictationUserMessage; now long dictations can double token usage or exceed context, and templates that relied on the placeholder to control where the sole transcript appears will see duplicate input. Please either avoid expanding the placeholder once the transcript is a separate user message, or suppress the extra user turn for placeholder templates.
Useful? React with 👍 / 👎.
|
This is a tricky fix as it causes conflicts betweeen another PR about the state of system vs user prompt. Would to have a discussion before we merge this. |
Description
Fixes a bug where AI enhancement instructions were placed in the user message instead of the system prompt, causing strict role-separation models to answer the transcript rather than apply the instructions.
Root cause — two independent sites
1.
ContentView.processTextWithAI(user-facing dictation path)The dictation branch hardcoded
systemPrompt = ""and folded instructions + transcript into the user turn viarenderDictationUserMessage. A comment in the code marked this as intentional ("dictation enhancement folds the prompt + transcript into a single user turn"), but it breaks any model that treats role separation strictly (e.g. Cerebras gpt-oss-120b, local models with system-prompt enforcement).2.
DictationPostProcessingService.process(local API path —/v1/postprocess)Same pattern, same root cause. This path is called from
InferenceAPIController, not from the main UI dictation flow.Fix
Both sites now consistently put the instruction text in the system role and the transcript alone in the user role:
The
isDictationCallconditional inContentViewis removed — both branches now do the same thing.DictationPostProcessingServicegets the same treatment.The no-prompt case (prompt turned off) is unaffected:
effectiveDictationSystemPromptreturns""when prompts are off, so the existingif !systemPrompt.isEmptyguard in the messages builder correctly omits the system entry.Type of Change
Related Issues
Testing
DictationSystemPromptTests:effectiveDictationSystemPromptreturns configured instruction text (not empty)effectiveDictationSystemPromptfalls back to a non-empty built-in defaultrenderDictationUserMessage("")returns transcript only (no instructions bleeding in)${transcript}placeholder substitution still worksswiftlint --strict --config .swiftlint.yml Sources— 0 violationsswiftformat --config .swiftformat Sources/Fluid/Services/DictationPostProcessingService.swift Sources/Fluid/ContentView.swift— 0 additional changesNotes
Full end-to-end verification requires a live provider that exhibits the role-conflation failure (e.g. Cerebras gpt-oss-120b or a local model strict about system vs. user separation). The unit tests cover the message-composition logic without a network call.