Project site: https://jameswei.github.io/tiny-rag-lab/
tiny-rag-lab is a learning-first RAG engine/laboratory for understanding how
classic retrieval-augmented generation works end to end.
The goal is to keep the RAG lifecycle visible: document loading, text normalization, chunking, metadata, embeddings, local vector search, retrieval, prompt assembly, answer generation, citations, evaluation, and failure inspection.
local corpus -> documents -> normalized text -> chunks -> embeddings
-> local vector index -> query embedding -> cosine retrieval
-> grounded prompt -> generated answer with citations
Retrieval
- Dense vector search, BM25 keyword retrieval, and hybrid fusion (Reciprocal Rank Fusion)
- Optional second-pass reranking — fake or cross-encoder
Evaluation
rag eval: hit rate @ k, MRR, context precision, context recall- LLM-as-judge answer metrics: faithfulness, relevance, correctness
Observability
- Per-query trace output: retriever, scores, ranked chunks, stage latency, prompt context
rag diagnose: curated failure cases with baseline vs. intervention comparison
Generation
- Token-budget context packing; omitted chunks recorded in trace
- Optional
--output-format jsonfor structured answer output
Chunking
fixed_character: sliding window (default)structural: Markdown-aware block boundariessemantic: embedding-based topic-shift detection (experimental)
- Python ·
argparseCLI ·uv - Embeddings:
sentence-transformers/all-MiniLM-L6-v2(local) - Vector index: NumPy (no vector database)
- Generation: OpenAI-compatible API
- Test backends: fake embedder + fake generator (fully offline)
- Corpus: IBM
watsonxDocsQA - No LangChain / LlamaIndex / Haystack wrapper
rag index --corpus PATH --index-dir .tiny-rag/index --chunk-size 800 --chunk-overlap 120
rag index --corpus PATH --index-dir .tiny-rag/index --chunking-strategy structural
rag index --corpus PATH --index-dir .tiny-rag/index --chunking-strategy semantic --semantic-similarity-threshold 0.5
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever dense
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever bm25
rag retrieve "question text" --index-dir .tiny-rag/index --top-k 5 --retriever hybrid
rag ask "question text" --index-dir .tiny-rag/index --top-k 5
rag ask "question text" --index-dir .tiny-rag/index --context-budget 8192
rag ask "question text" --index-dir .tiny-rag/index --context-budget 8192 --output-format json
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever dense
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever bm25
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --top-k 5 --retriever hybrid
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --judge fake --generator fake
rag eval --qa-file corpus/watsonx-docsqa/qa.jsonl --index-dir .tiny-rag/index --judge fake --generator fake --context-budget 8192
rag diagnose --cases-file tests/fixtures/failure/cases.jsonl --index-dir .tiny-rag/index
rag diagnose --cases-file tests/fixtures/failure/cases.jsonl --index-dir .tiny-rag/index --judge fake --generator fake
rag diagnose --cases-file tests/fixtures/failure/cases.jsonl --index-dir .tiny-rag/index --judge fake --generator fake --context-budget 8192Help is available for each command:
uv run rag --help
uv run rag index --help
uv run rag retrieve --help
uv run rag ask --help
uv run rag eval --help
uv run rag diagnose --helpInstall/sync dependencies:
uv sync --group devRun tests:
uv run pytest --tb=short -qPrepare the primary corpus after dependencies are installed:
uv run python scripts/prepare_watsonx_docsqa.py --inspect
uv run python scripts/prepare_watsonx_docsqa.py --output-dir corpus/watsonx-docsqaGenerated corpora and indexes are intentionally ignored by git:
corpus/
.tiny-rag/
- Proposal: project purpose, philosophy, and non-goals
- Roadmap: directional phase sequence
- Architecture: conceptual RAG planes and boundaries
- File structure: repository map
- Phase docs: phase specs and taskboards