Semantic code search engine with hybrid retrieval, AST-aware chunking, and RAG — built and benchmarked against Zulip's ~500 K-line codebase.
Search results — each chunk card shows its file path, line range, RRF-fused score, and source/language/chunk-type badges. The hybrid pipeline surfaces both semantically similar and exact-identifier matches; the cross-encoder reranker orders them by relevance.
Ask with citations — the RAG answer is grounded in the retrieved chunks. Every claim
links back to a specific file:line range so you can jump directly to the source.
flowchart LR
subgraph Ingest
A[Git repo] --> B[File discovery\n& SHA tracking]
B --> C[AST chunker\ntree-sitter]
C --> D[Embedder\nall-MiniLM-L6-v2]
D --> E[(Qdrant\nvector index)]
C --> F[(BM25\nin-memory index)]
end
subgraph Query
G[User query] --> H[Embed query]
H --> I[Vector search\nQdrant]
G --> J[BM25 keyword\nsearch]
I & J --> K[Reciprocal\nRank Fusion]
K --> L[Cross-encoder\nreranker]
L --> M[Top-N chunks]
M --> N[LLM\nClaude Haiku 4.5]
N --> O[Answer +\ncitations]
end
E --> I
F --> J
# 1 — spin up Qdrant + API + frontend
docker compose up -d
# 2 — ingest a repo (runs in ~9.5 min for Zulip)
make ingest REPO=data/repos/zulip
# 3 — ask a question
make ask Q="How do I cache a function result keyed by its arguments?"
# or open the UI
open http://localhost:5173The API is available at http://localhost:8000; interactive docs at /docs.
Evaluated on a 40-question Zulip question set (eval/questions.jsonl).
| Metric | Score |
|---|---|
| Hit@5 | 85.0 % |
| Retrieval@5 | 72.1 % |
| Retrieval@10 | 80.8 % |
| Chunker | Hit@5 | R@5 |
|---|---|---|
| Line-based (baseline) | 67.5 % | 57.9 % |
| AST-aware (tree-sitter) | 85.0 % | 72.1 % |
| Delta | +17.5 pp | +14.2 pp |
| Metric | Score |
|---|---|
| Faithfulness | 0.96 / 1.0 |
| Relevance | 0.96 / 1.0 |
| Operation | P50 |
|---|---|
| Search (embed + retrieve + rerank) | ~2.4 s |
| Full ask (search + Claude generate) | ~5–8 s |
| Ingest (6,392 chunks / 846 files) | ~9.5 min |
| Component | Choice | Why |
|---|---|---|
| Embeddings | all-MiniLM-L6-v2 (local) |
Fast, no API cost, good semantic recall on code |
| Vector DB | Qdrant | HNSW index, payload filters, Docker-friendly |
| Keyword search | BM25 (rank_bm25) |
Exact identifier matching that dense search misses |
| Fusion | Reciprocal Rank Fusion (k=60) | Parameter-free, robust across score scales |
| Reranker | ms-marco-MiniLM-L-6-v2 cross-encoder |
Re-scores top-50 candidates; +14 pp Hit@5 over RRF alone |
| AST chunking | tree-sitter (Python + TypeScript) | Function/class boundaries beat arbitrary line splits |
| LLM | Claude Haiku 4.5 (Anthropic) | Fast, cheap per-token, supports Gemini/Ollama swap via config.yaml |
| API | FastAPI | Async, automatic OpenAPI docs, Pydantic validation |
| Frontend | React + Vite + Tailwind | Minimal; no framework lock-in |
Dense vector search excels at semantic similarity ("how does X work?") but struggles with exact
identifier matches ("find send_message_backend"). BM25 is the inverse — perfect recall for
exact tokens, poor semantic generalisation.
Reciprocal Rank Fusion merges the two ranked lists without needing calibrated scores: each
document's fused score is Σ 1/(k + rank_i) across retrievers. In practice this recovers
candidates that either retriever misses alone, and the cross-encoder reranker then applies a
richer relevance signal to the merged top-50. The result: +17.5 pp Hit@5 over a pure
dense-only baseline.
- Recall vs. ranking gap: Retrieval@10 (80.8 %) is the ceiling for reranking. Questions that fall outside the top-10 candidates can never be answered correctly regardless of reranker quality — expanding the candidate pool (top-50 or top-100) before reranking would push this ceiling up.
- CPU cross-encoder latency floor: The reranker runs on CPU; reranking 50 candidates takes
~1–2 s. A GPU or a lighter model (e.g.
ms-marco-TinyBERT) would cut this significantly. - Language coverage: AST chunking is currently Python + TypeScript only. Go, Rust, Java, and C fall back to the line-based chunker, reducing chunk quality for polyglot repos.
- Query expansion: generate 3–5 hypothetical code snippets (HyDE) or sub-queries before retrieval to improve recall on abstract questions.
- Code-aware embeddings: swap
all-MiniLM-L6-v2forCodeBERTorUniXcoderto better capture code structure rather than prose semantics. - tree-sitter multi-language: add Go, Rust, TypeScript (deeper), and Java grammars to extend AST-level chunking to the full long tail of polyglot projects.
- Incremental re-indexing: currently a full re-ingest is required on repo changes; a file-level SHA diff + partial Qdrant upsert would bring this from minutes to seconds.
- Eval set expansion: grow from 40 to 80–100 questions for tighter confidence intervals on Retrieval@K and faithfulness.

