Skip to content

read tool inlines PDFs/images as base64 into history → context overflow (HTTP 400) on text-only models #181

@datekpro

Description

@datekpro

Summary

The read tool inlines whole binary files (PDFs, images) as base64 data-URIs directly into the conversation history. On a text-only model such as deepseek-reasoner, this is both unusable and a fast path to context overflow: each PDF read adds ~1.6 MB ≈ ~400K tokens to the message history, which is then resent every turn (and on session resume), eventually producing a hard HTTP 400:

400 This model's maximum context length is 1048565 tokens. However, you requested 5677038 tokens
(5677038 in the messages, 0 in the completion). Please reduce the length of the messages or completion.

The 400 fires in SessionManager.createSession → activateSession → createChatCompletionStream, because session (re)activation loads the entire stored history into messages.

Environment

  • @vegamo/deepcode-cli 0.1.30
  • Model: deepseek-reasoner (https://api.deepseek.com), thinkingEnabled: true
  • Node v23.11

Steps to reproduce

  1. Start a session and read a PDF (e.g. an invoice/report).
  2. Observe the tool output stored in history is data:application/pdf;base64,JVBERi0... (the whole file).
  3. Read a few more PDFs (or resume the session). Each adds ~400K tokens.
  4. The next request is rejected with the 400 above once the cumulative history exceeds the model's context window.

Real example: a session with 12 records total but 4 × ~1.6 MB read-tool base64 PDF outputs (~1.6M tokens); larger sessions reached 5.6M tokens.

Root cause (from the bundled dist/cli.js)

  • The read tool's PDF branch returns output: \data:application/pdf;base64,${base64}`(metadatamime: "application/pdf"`).
  • Images go through bufferToDataUrl(buffer, mimeType)data:${mimeType};base64,${...} for IMAGE_MIME_BY_EXT (png/jpg/jpeg/gif/webp).
  • There is no multimodality gating — base64 is inlined regardless of whether the active model can consume it (deepseek-reasoner cannot).
  • There is no per-tool-output size cap and no pre-send token-budget check against the model context limit; the only "truncate" in the bundle is UI rendering (wrap: "truncate-end").

Suggested fixes

  1. Gate base64 inlining on model multimodality. For text-only models, never inline binary; for PDFs, extract text (e.g. pdftotext/a PDF parser) and pass the text instead.
  2. Cap per-tool-output size before appending to history (byte/token limit + a "[truncated]" notice), so a single read can't add hundreds of thousands of tokens.
  3. Add a pre-send token-budget guard against the model's max context: estimate request size, keep a margin under the limit, and compact / drop-oldest (or fail soft with a clear message) instead of surfacing a raw provider 400.

Workaround

Locally patched the read PDF branch to return pdftotext-extracted text (capped) instead of base64 — output dropped ~40× (a real PDF: ~1.6 MB base64 → ~39 KB text) and PDF reading still works. Happy to share the diff if useful.


中文摘要

read 工具会把 PDF/图片整体以 base64 data-URI 直接写入对话历史。对于纯文本模型(如 deepseek-reasoner),每读一个 PDF 约增加 ~40 万 tokens,多次读取或恢复会话后超出上下文上限,报 400(max 1048565,requested 5677038)。根因:read 的 PDF 分支返回 data:application/pdf;base64,...,图片走 bufferToDataUrl,且没有按模型多模态能力做判断、没有单次输出截断、没有发送前 token 预算检查。建议:按模型能力决定是否内联;PDF 改为抽取文本(如 pdftotext);对工具输出做大小上限;发送前做 token 预算并优雅降级而非直接报 400。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions