Summary
The read tool inlines whole binary files (PDFs, images) as base64 data-URIs directly into the conversation history. On a text-only model such as deepseek-reasoner, this is both unusable and a fast path to context overflow: each PDF read adds ~1.6 MB ≈ ~400K tokens to the message history, which is then resent every turn (and on session resume), eventually producing a hard HTTP 400:
400 This model's maximum context length is 1048565 tokens. However, you requested 5677038 tokens
(5677038 in the messages, 0 in the completion). Please reduce the length of the messages or completion.
The 400 fires in SessionManager.createSession → activateSession → createChatCompletionStream, because session (re)activation loads the entire stored history into messages.
Environment
@vegamo/deepcode-cli 0.1.30
- Model:
deepseek-reasoner (https://api.deepseek.com), thinkingEnabled: true
- Node v23.11
Steps to reproduce
- Start a session and
read a PDF (e.g. an invoice/report).
- Observe the tool output stored in history is
data:application/pdf;base64,JVBERi0... (the whole file).
- Read a few more PDFs (or resume the session). Each adds ~400K tokens.
- The next request is rejected with the 400 above once the cumulative history exceeds the model's context window.
Real example: a session with 12 records total but 4 × ~1.6 MB read-tool base64 PDF outputs (~1.6M tokens); larger sessions reached 5.6M tokens.
Root cause (from the bundled dist/cli.js)
- The
read tool's PDF branch returns output: \data:application/pdf;base64,${base64}`(metadatamime: "application/pdf"`).
- Images go through
bufferToDataUrl(buffer, mimeType) → data:${mimeType};base64,${...} for IMAGE_MIME_BY_EXT (png/jpg/jpeg/gif/webp).
- There is no multimodality gating — base64 is inlined regardless of whether the active model can consume it (
deepseek-reasoner cannot).
- There is no per-tool-output size cap and no pre-send token-budget check against the model context limit; the only "truncate" in the bundle is UI rendering (
wrap: "truncate-end").
Suggested fixes
- Gate base64 inlining on model multimodality. For text-only models, never inline binary; for PDFs, extract text (e.g.
pdftotext/a PDF parser) and pass the text instead.
- Cap per-tool-output size before appending to history (byte/token limit + a "[truncated]" notice), so a single
read can't add hundreds of thousands of tokens.
- Add a pre-send token-budget guard against the model's max context: estimate request size, keep a margin under the limit, and compact / drop-oldest (or fail soft with a clear message) instead of surfacing a raw provider 400.
Workaround
Locally patched the read PDF branch to return pdftotext-extracted text (capped) instead of base64 — output dropped ~40× (a real PDF: ~1.6 MB base64 → ~39 KB text) and PDF reading still works. Happy to share the diff if useful.
中文摘要
read 工具会把 PDF/图片整体以 base64 data-URI 直接写入对话历史。对于纯文本模型(如 deepseek-reasoner),每读一个 PDF 约增加 ~40 万 tokens,多次读取或恢复会话后超出上下文上限,报 400(max 1048565,requested 5677038)。根因:read 的 PDF 分支返回 data:application/pdf;base64,...,图片走 bufferToDataUrl,且没有按模型多模态能力做判断、没有单次输出截断、没有发送前 token 预算检查。建议:按模型能力决定是否内联;PDF 改为抽取文本(如 pdftotext);对工具输出做大小上限;发送前做 token 预算并优雅降级而非直接报 400。
Summary
The
readtool inlines whole binary files (PDFs, images) as base64 data-URIs directly into the conversation history. On a text-only model such asdeepseek-reasoner, this is both unusable and a fast path to context overflow: each PDF read adds ~1.6 MB ≈ ~400K tokens to the message history, which is then resent every turn (and on session resume), eventually producing a hard HTTP 400:The 400 fires in
SessionManager.createSession → activateSession → createChatCompletionStream, because session (re)activation loads the entire stored history intomessages.Environment
@vegamo/deepcode-cli0.1.30deepseek-reasoner(https://api.deepseek.com),thinkingEnabled: trueSteps to reproduce
reada PDF (e.g. an invoice/report).data:application/pdf;base64,JVBERi0...(the whole file).Real example: a session with 12 records total but 4 × ~1.6 MB
read-tool base64 PDF outputs (~1.6M tokens); larger sessions reached 5.6M tokens.Root cause (from the bundled
dist/cli.js)readtool's PDF branch returnsoutput: \data:application/pdf;base64,${base64}`(metadatamime: "application/pdf"`).bufferToDataUrl(buffer, mimeType)→data:${mimeType};base64,${...}forIMAGE_MIME_BY_EXT(png/jpg/jpeg/gif/webp).deepseek-reasonercannot).wrap: "truncate-end").Suggested fixes
pdftotext/a PDF parser) and pass the text instead.readcan't add hundreds of thousands of tokens.Workaround
Locally patched the
readPDF branch to returnpdftotext-extracted text (capped) instead of base64 — output dropped ~40× (a real PDF: ~1.6 MB base64 → ~39 KB text) and PDF reading still works. Happy to share the diff if useful.中文摘要
read工具会把 PDF/图片整体以 base64 data-URI 直接写入对话历史。对于纯文本模型(如deepseek-reasoner),每读一个 PDF 约增加 ~40 万 tokens,多次读取或恢复会话后超出上下文上限,报 400(max 1048565,requested 5677038)。根因:read的 PDF 分支返回data:application/pdf;base64,...,图片走bufferToDataUrl,且没有按模型多模态能力做判断、没有单次输出截断、没有发送前 token 预算检查。建议:按模型能力决定是否内联;PDF 改为抽取文本(如 pdftotext);对工具输出做大小上限;发送前做 token 预算并优雅降级而非直接报 400。