Skip to content

[Bug]: Quest repository can grow to 100GB+ by capturing runtime artifacts, causing Git pack/OOM and daemon unhealthy Quest 仓库可能捕获运行产物并膨胀到 100GB+,导致 Git pack/OOM 和 daemon unhealthy #104

@jiexunshen

Description

@jiexunshen

Summary

DeepScientist quest storage can grow extremely large because runtime artifacts, Python virtual environments, model checkpoints, FAISS indexes, OpenCode/HuggingFace caches, and large experiment outputs are created inside the quest directory and appear to be captured by the quest repository/snapshot workflow. This caused Git pack operations to consume excessive memory, triggering WSL OOM and making the DeepScientist daemon become unhealthy or disconnect.
DeepScientist 的 Quest 存储可能会异常膨胀,因为运行产物、Python 虚拟环境、模型权重、FAISS 索引、OpenCode/HuggingFace 缓存以及大型实验输出被创建在 Quest 目录内部,并可能被 Quest 仓库 / 快照流程纳入管理。这导致 Git pack 操作占用大量内存,触发 WSL OOM,最终使 DeepScientist daemon 变为 unhealthy 或掉线。

Steps to reproduce

1.Set up DeepScientist on WSL2 Ubuntu.
2.Configure OpenCode as the runner.
3.Start DeepScientist in a lightweight project directory:
ds --here --runner opencode --no-browser
4.Create or run a research quest that performs baseline reproduction / experiments.
5.During the quest, let the agent set up local dependencies and run experiments.
6.The quest directory begins to accumulate large runtime files, including:
o Python virtual environment files
o CUDA / NVIDIA / Torch libraries
o HuggingFace / pip / OpenCode caches
o model checkpoints such as *.pth
o FAISS indexes such as .faiss
o OpenCode snapshots and internal Git pack files
7.Check disk usage:
cd ~/workspace/deepscientist-projects/ccks2026
du -h --max-depth=2 . | sort -h | tail -30
find . -type f -size +100M -not -path "./.git/
" -print
du -sh .git 2>/dev/null || echo "no .git"
8.Observe that the top-level project Git repository remains small, while DeepScientist/quests becomes extremely large.
9.Git pack operations may then consume very large memory, causing WSL OOM and DeepScientist daemon disconnection / unhealthy state.

1.在 WSL2 Ubuntu 中部署 DeepScientist。
2.配置 OpenCode 作为 runner。
3.在一个轻量项目目录中启动 DeepScientist:
ds --here --runner opencode --no-browser
4.创建或运行一个用于论文 / baseline 复现 / 实验的 research quest。
5.让智能体在 quest 中安装依赖、创建环境并执行实验。
6.Quest 目录内部开始积累大量运行产物,包括:
o Python 虚拟环境文件
o CUDA / NVIDIA / Torch 动态库
o HuggingFace / pip / OpenCode 缓存
o *.pth 模型权重
o .faiss FAISS 索引
o OpenCode snapshot 和内部 Git pack 文件
7.执行磁盘检查:
cd ~/workspace/deepscientist-projects/ccks2026
du -h --max-depth=2 . | sort -h | tail -30
find . -type f -size +100M -not -path "./.git/
" -print
du -sh .git 2>/dev/null || echo "no .git"
8.可以看到外层项目 Git 仓库很小,但 DeepScientist/quests 目录异常膨胀。
随后 Git pack 操作可能占用大量内存,导致 WSL OOM,DeepScientist daemon 掉线或变成 unhealthy。

Expected behavior

DeepScientist should prevent large runtime artifacts from being tracked, snapshotted, or packed by the quest Git repository by default. Expected safeguards may include:
Quest repositories should include a default .gitignore that excludes:
.venv/, venv/
node_modules/
.ds/opencode-home/.cache/
.ds/opencode-home/.local/share/opencode/snapshot/
baselines//venv/
experiments/
/.pth
experiments/**/
.pt
experiments//*.ckpt
experiments/
/.safetensors
experiments/**/
.faiss
large data files such as *.jsonl, *.parquet, *.db, *.sqlite, *.csv
Heavy experiment outputs should be stored outside the quest Git repository by default.
DeepScientist should warn or block when a quest exceeds a size threshold.
DeepScientist should warn or skip Git checkpoint/snapshot when large files or virtual environments are detected.
Git configuration should avoid memory-heavy automatic packing / GC for large quest workspaces.

DeepScientist 默认应避免将大型运行产物纳入 Quest Git 仓库的跟踪、快照或 pack 流程。预期的保护措施包括:
Quest 仓库应默认包含 .gitignore,排除:
.venv/, venv/
node_modules/
.ds/opencode-home/.cache/
.ds/opencode-home/.local/share/opencode/snapshot/
baselines//venv/
experiments/
/.pth
experiments/**/
.pt
experiments//*.ckpt
experiments/
/.safetensors
experiments/**/
.faiss
*.jsonl, *.parquet, *.db, *.sqlite, *.csv 等大数据文件
大型实验输出应默认存储在 Quest Git 仓库之外。
当 Quest 体积超过阈值时,应有明显警告或阻断。
当检测到大文件、虚拟环境或模型权重时,应跳过 Git checkpoint / snapshot 或提示用户确认。
对大型 Quest 工作区,应避免 Git 自动 GC / pack 导致高内存占用。

Actual behavior

In my case, the top-level project Git repository remained very small, but the internal DeepScientist quest directory grew to 163GB.
Observed disk usage:

188K .git
163G .
163G ./DeepScientist
163G ./DeepScientist/quests

Large files were found under:

DeepScientist/quests/002/.git/objects/pack/*
DeepScientist/quests/002/.ds/opencode-home/.cache/*
DeepScientist/quests/002/.ds/opencode-home/.local/share/opencode/snapshot/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/nvidia/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/torch/*
DeepScientist/quests/002/experiments/main//.pth
DeepScientist/quests/002/experiments/main//.faiss

This eventually caused Git pack operations to consume excessive memory. WSL reported OOM events such as:

opencode invoked oom-killer
Out of memory: Killed process ... (git)

After that, the DeepScientist daemon became unhealthy or disconnected. In some cases, the daemon status file still contained a stale PID even though no process was listening on port 20999.

在我的案例中,外层项目 Git 仓库非常小,但 DeepScientist 内部 Quest 目录膨胀到了 163GB。
观察到的磁盘占用:

188K .git
163G .
163G ./DeepScientist
163G ./DeepScientist/quests

大文件主要出现在以下位置:

DeepScientist/quests/002/.git/objects/pack/*
DeepScientist/quests/002/.ds/opencode-home/.cache/*
DeepScientist/quests/002/.ds/opencode-home/.local/share/opencode/snapshot/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/nvidia/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/torch/*
DeepScientist/quests/002/experiments/main//.pth
DeepScientist/quests/002/experiments/main//.faiss

随后 Git pack 操作占用了大量内存。WSL 中出现过类似 OOM 记录:

opencode invoked oom-killer
Out of memory: Killed process ... (git)

之后 DeepScientist daemon 变为 unhealthy 或掉线。有时状态文件里仍保留旧 PID,但实际已经没有进程监听 20999 端口。

Environment

OS: Windows 11
Runtime: WSL2 Ubuntu
Browser: Microsoft Edge / Chrome
DeepScientist runner: OpenCode
Model provider: CSTCloud Uni API, OpenAI-compatible endpoint
Main model tested:
minimax-m27
deepseek-v4-flash
Node.js: v22.22.3
npm: 10.9.8
OpenCode: 1.15.5
WSL memory configuration:
memory: about 27Gi available inside WSL
swap: 32Gi
GPU/runtime:
Local experiments may create Python venv and install Torch / NVIDIA CUDA-related packages inside the quest directory
DeepScientist version: lts

操作系统:Windows 11
运行环境:WSL2 Ubuntu
浏览器:Microsoft Edge / Chrome
DeepScientist runner:OpenCode
模型服务:中国科技云 Uni API,OpenAI-compatible endpoint
测试过的主模型:
minimax-m27
deepseek-v4-flash
Node.js:v22.22.3
npm:10.9.8
OpenCode:1.15.5
WSL 内存配置:
WSL 内可见内存约 27Gi
swap:32Gi
GPU / 运行时:
本地实验可能会在 Quest 目录内部创建 Python venv,并安装 Torch / NVIDIA CUDA 相关依赖
DeepScientist 版本:lts

Logs, screenshots, or stack traces

Useful logs / command outputs:

du -h --max-depth=2 . | sort -h | tail -30
find . -type f -size +100M -not -path "./.git/*" -print
du -sh .git 2>/dev/null || echo "no .git"

Observed output:

188K    .git
163G    .
163G    ./DeepScientist
163G    ./DeepScientist/quests

Large files found under:

DeepScientist/quests/002/.git/objects/pack/*
DeepScientist/quests/002/.ds/opencode-home/.cache/*
DeepScientist/quests/002/.ds/opencode-home/.local/share/opencode/snapshot/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/nvidia/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/torch/*
DeepScientist/quests/002/experiments/main/*/*.pth
DeepScientist/quests/002/experiments/main/*/*.faiss

OOM-related kernel logs observed earlier:

opencode invoked oom-killer
Out of memory: Killed process ... (git)

DeepScientist-related logs included entries like:

daemon.signal_received: SIGTERM
daemon.shutdown_requested: source=signal:sigterm
runner.opencode.completed: exit_code=-15

I can provide the full exported diagnostic archive if needed.


有用的日志 / 命令输出:

du -h --max-depth=2 . | sort -h | tail -30
find . -type f -size +100M -not -path "./.git/*" -print
du -sh .git 2>/dev/null || echo "no .git"

观察到的输出:

188K    .git
163G    .
163G    ./DeepScientist
163G    ./DeepScientist/quests

发现的大文件路径:

DeepScientist/quests/002/.git/objects/pack/*
DeepScientist/quests/002/.ds/opencode-home/.cache/*
DeepScientist/quests/002/.ds/opencode-home/.local/share/opencode/snapshot/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/nvidia/*
DeepScientist/quests/002/baselines/local/faid/algorithm/venv/lib/python3.12/site-packages/torch/*
DeepScientist/quests/002/experiments/main/*/*.pth
DeepScientist/quests/002/experiments/main/*/*.faiss

此前观察到的 OOM 相关内核日志:

opencode invoked oom-killer
Out of memory: Killed process ... (git)

DeepScientist 相关日志中也出现过:

daemon.signal_received: SIGTERM
daemon.shutdown_requested: source=signal:sigterm
runner.opencode.completed: exit_code=-15

如有需要,我可以提供完整的诊断日志压缩包。(因为可能涉及到隐私,请单独联系我获取)

Suspected scope

The issue may be related to the following subsystems:

Quest repository lifecycle
DeepScientist/quests/<quest_id>
one quest = one Git repository behavior
Git checkpoint / snapshot mechanism
large files may enter Git object storage or pack workflow
Git pack operations can become memory-heavy
Artifact storage policy
experiment outputs, model checkpoints, FAISS indexes, venvs, and caches are stored inside quest roots
OpenCode runner integration
.ds/opencode-home
OpenCode snapshots
OpenCode / HuggingFace / pip caches inside the quest
Default ignore rules
missing or insufficient default .gitignore for quest repositories
Runtime safety checks
no apparent warning before quest size grows to hundreds of GB
no apparent guard before Git operations on large quest directories

怀疑影响范围包括以下子系统:

Quest 仓库生命周期
DeepScientist/quests/<quest_id>
one quest = one Git repository 的行为
Git checkpoint / snapshot 机制
大文件可能进入 Git object storage 或 pack 流程
Git pack 操作会变成高内存操作
Artifact 存储策略
实验输出、模型权重、FAISS 索引、venv 和缓存被放在 Quest 根目录内
OpenCode runner 集成
.ds/opencode-home
OpenCode snapshot
Quest 内部的 OpenCode / HuggingFace / pip 缓存
默认 ignore 规则
Quest 仓库缺少或没有足够强的默认 .gitignore
运行时安全检查
Quest 增长到数百 GB 前缺少明显警告
在大型 Quest 目录上执行 Git 操作前缺少保护机制

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions