Project page: https://viggomeesters.com/vault-layer/
Stop making agents crawl your vault. Give them a local read model instead.
VaultLayer turns a Markdown/Obsidian vault into a rebuildable local database with full-text search, vectors, WikiLinks, metadata, provenance, and CLI/MCP access. Your vault stays plain files. The generated index, embeddings, benchmark reports, and caches live outside the vault and outside the repo.
Download VaultLayer if you have a serious Markdown/Obsidian vault and want to:
- query it without repeatedly scanning folders and parsing Markdown;
- give agents bounded, cited context instead of dumping broad filesystem reads into prompts;
- keep the vault as source of truth while generating disposable search/vector state elsewhere;
- preserve provenance for every result: path, heading/chunk id, content hash, modified time, and excerpt;
- compare raw filesystem search vs indexed retrieval with a repeatable benchmark;
- prototype local-first vault retrieval before committing to a viewer, MCP server, or cloud sync story.
Markdown/Obsidian vault -> external VaultLayer DB -> CLI / MCP / viewer / benchmark
Current pilot includes:
- read-only indexing of Markdown/Obsidian-style vaults;
- SQLite + FTS5 local search by default;
- sqlite-vec + real local FastEmbed MiniLM vector retrieval;
- deterministic synthetic messy-vault preflight;
- package, doctor, and benchmark scripts;
- safety guards against committing private vault content or generated DB/cache artifacts.
VaultLayer is a pilot-ready local MVP, not a production product or a guaranteed speedup for every vault. Performance depends on vault size, filesystem, machine, query shape, and first-run model/cache setup. The repo gives you a safe way to test and measure that locally.
git clone https://github.com/viggomeesters/vault-layer.git
cd vault-layer
make check
cargo run -p vault-layer -- --helpProve the flow on a generated messy fake vault, without touching a real vault:
python3 scripts/make_messy_vault.py /tmp/vault-layer-messy --force
scripts/package_smoke.sh /tmp/vault-layer-messy --work-dir /tmp/vault-layer-package-smoke
scripts/benchmark_vault.sh /tmp/vault-layer-messy \
--state-dir /tmp/vault-layer-benchmark \
--query "performance baseline vector provenance" \
--limit 500Expected proof points: package_smoke=ok, doctor_status=ok, runtime_outside_vault=true, indexed notes/sections, local embeddings, and benchmark timings.
Index your own local vault when you are ready:
cargo run -p vault-layer -- index /path/to/vault --state-dir ~/.local/share/vault-layer --limit 20Inspect the configured storage backend:
cargo run -p vault-layer -- backend-info
# Recommended local SQLite + FTS5 retrieval projection, no credentials/network
cargo run -p vault-layer -- index /path/to/vault
# Optional DuckDB analytics/export sidecar
VAULT_LAYER_BACKEND=duckdb cargo run -p vault-layer -- index /path/to/vault
# Explicit remote sync to hosted Turso/libSQL (requires real credentials)
TURSO_DATABASE_URL=libsql://your-database.turso.io \
TURSO_AUTH_TOKEN=*** \
cargo run -p vault-layer -- sync-turso /path/to/vault --limit 100Local SQLite + FTS5 is the implemented primary retrieval default. TURSO_DATABASE_URL / TURSO_AUTH_TOKEN can be configured for the Turso/libSQL target, but remote sync only runs through explicit sync-turso / index --remote-sync; VaultLayer will not upload private vault text by accident.
Search with citations:
vault-layer search "agent context" --db ~/.local/share/vault-layer/<vault-id>/vault-layer.db --json
vault-layer get-note "Projects/example.md" --db <db> --json
vault-layer related "Projects/example.md" --db <db> --jsonGenerate local embeddings and run vector retrieval:
# deterministic smoke/test provider
vault-layer embed --db <db> --model deterministic-v0
# real local ONNX model via Python fastembed; cache stays outside repo/vault
python3 -m pip install fastembed==0.7.3
vault-layer embed --db <db> --model fastembed-mini-lm
vault-layer vector-search "agent context" --db <db> --model fastembed-mini-lm --jsonMCP smoke interface:
vault-layer serve --mcp --list-tools
vault-layer serve --mcp --call vault_search --query "agent" --db <db>VaultLayer treats the source vault as read-only by default.
- Do not commit private vault content.
- Do not commit generated DB/index/embedding files.
- Runtime state belongs outside both the repo and the vault, e.g.
~/.local/share/vault-layer/. - Examples and tests must use synthetic fixtures.
- Writeback is disabled in the MVP.
- SQLite + FTS5 is the recommended/default local retrieval backend over
.mdwhile the vault remains source of truth. - sqlite-vec is the intended native local vector path;
fastembed-mini-lmis the working real local embedding model path, while deterministic JSON cosine remains a smoke-test fallback. - DuckDB is an optional analytics/export sidecar: set
VAULT_LAYER_BACKEND=duckdb. - Hosted Turso/libSQL is treated as cloud/sync/export target, not the local core.
- VaultLayer core — parser, stable IDs, shadow DB, search, vectors, provenance, human relevance scores.
- VaultLayer CLI/MCP — agent and automation surface.
- Mega Vault Viewer — human UI consumer of VaultLayer read models.
docs/ARCHITECTURE.mddocs/api.mddocs/embeddings.mddocs/mcp.mddocs/wsl-smoke.mddocs/ROADMAP.mddocs/REPO_COMPLETE.mddocs/FILL_LOOP.mddocs/claim-evidence-gate.mddocs/full-vault-progress-resume.mddocs/local-embedding-adapter.mddocs/niels-pilot-install.mddocs/niels-pilot-benchmark.mddocs/niels-pilot-runbook.mddocs/synthetic-messy-vault-preflight.mddocs/local-embedding-adapter-blocker.md
make checkThe gate runs Rust checks/tests, repository safety guard, Python guard tests, git diff --check, and a generated-artifact tracking check.
cargo build --release -p vault-layer
./target/release/vault-layer --helpSee docs/PACKAGE.md.
Read CONTRIBUTORS.md, SUPPORT.md, SECURITY.md, and AGENTS.md. Keep all fixtures synthetic and all runtime artifacts outside Git.
MIT. See LICENSE.
The accepted backend decision is documented in docs/ADR-0001-primary-retrieval-backend.md; benchmark evidence lives in docs/backend-decision-benchmark.md.
Native sqlite-vec is feasible and has a scoped Rust/rusqlite smoke adapter exposed via vault-layer sqlite-vec-info; see docs/sqlite-vec-packaging-spike.md. vault-layer embed refreshes sqlite-vec rows for the selected model, including the working real local fastembed-mini-lm path.
Current bounded real-vault retrieval benchmark evidence lives in docs/full-vault-retrieval-benchmark.md. Long index runs now emit progress and can skip rewriting an existing same-count SQLite DB; see docs/full-vault-progress-resume.md. Target-vault performance still must be proven per pilot with scripts/benchmark_vault.sh.
Vector fallback results now expose cosine_score and text_quality_score so low-information chunks can be demoted while native sqlite-vec and real embeddings mature. See docs/retrieval-quality-first-pass.md.
vault-layer embed refreshes native sqlite-vec rows when available, vector-search prefers native sqlite-vec KNN, and hybrid-search reranks FTS candidates with vector, human relevance, and text-quality signals. Use --model fastembed-mini-lm for the working real local model path, or --model deterministic-v0 for smoke tests. See docs/sqlite-vec-hybrid-retrieval.md and docs/local-embedding-adapter.md.