Code Compass

An end-to-end repository question answering system that indexes a public GitHub codebase, retrieves grounded code evidence, and generates cited answers through a retrieval-augmented generation pipeline.

This project includes:

a React frontend for repository submission and conversational querying
a FastAPI backend for indexing and answering questions
a hybrid retrieval pipeline with semantic search, BM25, and reranking
an evaluation harness for measuring retrieval quality and answer grounding

Why This Project Matters

This project is written to show the parts recruiters and engineering reviewers usually look for in a personal AI project:

clear system design
practical frontend, backend, and deployment integration
retrieval and ranking logic beyond a single LLM prompt
measurable evaluation instead of anecdotal demos
thoughtful tradeoffs around cost, latency, and persistence

Code Compass brings those elements together in one end-to-end application with visible architecture, source citations, and an evaluation harness ready for benchmark results.

What The System Does

A user pastes a GitHub repository URL into the UI.
The backend clones the repository into a temporary local directory.
Source files are filtered and chunked using tree-sitter and fallback text chunking.
The system generates embeddings for chunks and stores them in a Chroma-backed vector layer.
At query time, the system retrieves evidence with:
- semantic vector search
- lexical BM25 search
- reciprocal rank fusion
- cross-encoder reranking
The top grounded chunks are passed to the LLM to generate a concise answer.
The UI displays the answer with file-level citations and GitHub source links.

App Screens

Architecture

┌──────────────────────┐
│      React UI        │
│  repo submit + chat  │
│  citations + status  │
└──────────┬───────────┘
           │ HTTP / JSON
           ▼
┌──────────────────────┐
│    FastAPI Server    │
│   routes + session   │
│      validation      │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────────────────────────────┐
│              CodebaseRAGSystem               │
│ indexing orchestration + query orchestration │
└───────┬───────────────┬───────────────┬──────┘
        │               │               │
        │               │               │
        ▼               ▼               ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ RepoFetcher  │  │ CodeParser   │  │ Embeddings   │
│ clone/filter │  │ tree-sitter  │  │ Bedrock/local│
└──────┬───────┘  │ fallback     │  └──────┬───────┘
       │          └──────┬───────┘         │
       │                 │                 │
       └────────────┬────┴────────────┬────┘
                    ▼                 ▼
           ┌──────────────┐   ┌──────────────┐
           │ In-memory    │   │ Chroma       │
           │ repo/session │   │ vector store │
           │ state        │   └──────┬───────┘
           └──────────────┘          │
                                     ▼
                           ┌──────────────────┐
                           │ Hybrid Retrieval │
                           │ semantic + BM25  │
                           │ + reranking      │
                           └────────┬─────────┘
                                    ▼
                           ┌──────────────────┐
                           │   LLM Answerer   │
                           │ grounded answer  │
                           │ + citations      │
                           └──────────────────┘

Frontend

React 19
Tailwind CSS
Axios for API communication

Responsibilities:

collect the GitHub repository URL
poll indexing state
send chat questions and prior conversation turns
render markdown-like answers
display cited files, symbols, and line ranges

Main entry points:

Backend

FastAPI
Pydantic
in-memory session and repository state

Responsibilities:

validate requests
manage session-scoped repository state
run indexing in the background
execute retrieval and answer generation
return grounded answers and source metadata

Main entry points:

Retrieval Pipeline

tree-sitter for code-aware chunking
Amazon Bedrock or local embeddings for semantic retrieval depending on environment
BM25 for lexical retrieval
reciprocal rank fusion to combine retrieval channels
a cross-encoder reranker for final source ordering
Groq or Amazon Bedrock generation depending on environment configuration

Core modules:

Data Flow

Indexing Flow

POST /api/repos/index
Backend registers the repo against a session
Background task clones the repo
Files are filtered by extension, directory, and size
Files are chunked into code-aware segments
Embeddings are generated for each chunk
Chunks are stored in the vector layer and in in-memory retrieval state
Metadata and progress are exposed back to the UI

Query Flow

POST /api/query
The backend validates the session and repository status
The question is expanded using lightweight intent heuristics
Semantic search retrieves candidate chunks
BM25 retrieves lexical matches
Results are fused and reranked
Final sources are selected and passed to the LLM
The backend returns:
- answer
- confidence
- sources
- repository metadata

Tech Stack Decisions

Why FastAPI

fast iteration speed
strong request validation through Pydantic
simple background task support
clean fit for JSON APIs and model-driven backend code

Why React

straightforward stateful UI for a single-page workflow
easy integration with polling, chat state, and citation rendering
strong ecosystem for incremental iteration

Why tree-sitter

better chunk boundaries than naive fixed-length splitting
lets the system reason around functions, classes, and symbols
improves retrieval quality for implementation-focused questions

Why Hybrid Retrieval

Pure semantic search misses exact symbols and file names. Pure lexical search misses semantic intent. This project combines both because code questions often need:

exact identifiers
nearby implementation detail
cross-file semantic similarity

Why Chroma

simple vector abstraction
one vector database path for both local and production runtime
persistent local storage without a separate hosted vector service
direct support for externally generated embeddings and metadata filters

Why In-Memory Session State

repository/session metadata is short-lived and cleared when the backend restarts
no separate relational database is needed for the current product flow
the API still exposes indexing status and session-scoped repositories while keeping deployment simpler

Runtime Environments

Local Development

Local development is configured for higher-quality experimentation:

Claude Sonnet 4 on Amazon Bedrock for answer generation
Cohere Embed v4 on Amazon Bedrock for semantic retrieval

This setup is useful for:

higher quality local experiments
comparing retrieval and answer quality in a managed-model environment

Recommended local runtime:

LLM_PROVIDER=bedrock
EMBEDDING_PROVIDER=bedrock
AWS_REGION=us-east-1
BEDROCK_LLM_MODEL=anthropic.claude-sonnet-4-20250514-v1:0
BEDROCK_EMBEDDING_MODEL=cohere.embed-v4:0
BEDROCK_EMBEDDING_DIM=1536
CHROMA_PATH=./data/chroma
CHROMA_COLLECTION=repo_qa_chunks

Evaluation

The evaluation harness is now designed around Amazon Bedrock:

Bedrock Claude Opus 4 for the RAGAS judge model
app-configured embeddings during evaluation

Recommended eval runtime:

EVAL_MODEL=anthropic.claude-opus-4-20250514-v1:0
AWS_REGION=us-east-1

Production Deployment

The production deployment target is:

frontend on Vercel
backend on Hugging Face Spaces

Production inference is configured differently from local development:

Groq-hosted Llama for answer generation
lightweight local sentence-transformer embeddings for semantic retrieval
Chroma DB for vector storage

This production setup was chosen to fit Hugging Face Spaces free-tier constraints more comfortably while keeping the retrieval and answer pipeline intact. Chroma is used in production and local development so the vector storage behavior stays consistent across environments.

Recommended production runtime:

LLM_PROVIDER=groq
EMBEDDING_PROVIDER=local
LIGHTWEIGHT_LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHROMA_PATH=./data/chroma
CHROMA_COLLECTION=repo_qa_chunks

Deployment

Production Topology

Vercel hosts the React frontend
Hugging Face Spaces hosts the FastAPI backend
the backend is packaged and deployed as a Docker Space
GitHub Actions syncs the backend code to the Space on pushes to main

Docker

The backend is deployed with Docker using:

server/Dockerfile

The container:

installs Python dependencies
copies the backend application
starts the FastAPI app with Uvicorn on port 7860

CI/CD

Continuous deployment is handled through:

.github/workflows/deploy-hf-space.yml

The workflow:

runs on pushes to main
syncs the server/ directory to the Hugging Face Space
triggers the Docker Space rebuild automatically

Evaluation And Benchmarking

The project includes an end-to-end eval harness that calls the live API instead of mocking the retrieval pipeline.

Files:

The benchmark currently measures:

retrieval hit rate
top-1 hit rate
mean reciprocal rank
source recall
duplicate source rate
keyword-based answer checks
grounded answer rate
optional RAGAS judge metrics such as faithfulness and answer relevancy

The current RAGAS judge configuration uses Bedrock Claude Opus 4 via EVAL_MODEL.

The project includes a measurable end-to-end evaluation workflow alongside the product itself. Metric values are intentionally left pending until the benchmark is rerun, so the README does not claim unverified results.

Benchmark Snapshot

Current sample benchmark target:

Documenso (https://github.com/documenso/documenso.git)
43 evaluation cases
10 categories
4 multi-turn conversation cases
full-application coverage across architecture, docs, setup, API layers, document flows, signing, email, jobs, tests, and follow-up questions

Metric	Result
Retrieval hit rate	To be added after rerun
Top-1 hit rate	To be added after rerun
Mean reciprocal rank	To be added after rerun
Source recall	To be added after rerun
Grounded answer rate	To be added after rerun
Keyword/checklist pass rate	To be added after rerun
Reference-support pass rate	To be added after rerun
Faithfulness (RAGAS, supporting)	To be added after rerun
Answer relevancy (RAGAS, supporting)	To be added after rerun
Context precision (RAGAS, supporting)	To be added after rerun

What these numbers mean:

the system should retrieve at least one relevant source for most benchmark cases
the first-ranked source is expected to be relevant in most cases, with an internal 80% top-1 target
the benchmark includes architecture, API, setup, docs, tests, cross-file workflows, code-generation checklists, and conversation-style questions
RAGAS is treated as a secondary judge signal; deterministic retrieval and grounded checklist metrics are the primary gates

Benchmark strengths:

full-stack application benchmark rather than a library-only benchmark
product-domain questions around documents, recipients, fields, signing, emails, jobs, and webhooks
measurable end-to-end performance instead of anecdotal examples once the new run is complete

Benchmark-exposed weaknesses:

the sample set is focused on one target project, so it should be broadened before being presented as general benchmark evidence
Documenso is a large TypeScript monorepo, so context precision and directory-level source selection matter more than in the old library-focused eval
some cross-file, specific-function, and test-heavy questions may remain harder than single-file API questions
canonical implementation files may not always rank first on the hardest prompts

Project Strengths

full-stack architecture with a clear data flow
code-aware retrieval rather than plain document retrieval
practical hybrid search design
session-aware repo isolation
source-grounded answer generation
explicit benchmark and evaluation workflow

Known Tradeoffs

retrieval state is intentionally session-scoped and mostly in memory
cloned repositories are temporary and deleted after indexing
repository metadata is lightweight and persisted separately from vector state
if the backend restarts, repositories must be re-indexed
the benchmark is strong for the current project scope and can be expanded further across repositories over time

Local Setup

Backend

cd server
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export LLM_PROVIDER=bedrock
export EMBEDDING_PROVIDER=bedrock
export AWS_REGION=us-east-1
export BEDROCK_LLM_MODEL=anthropic.claude-sonnet-4-20250514-v1:0
export BEDROCK_EMBEDDING_MODEL=cohere.embed-v4:0
export BEDROCK_EMBEDDING_DIM=1536
export EVAL_MODEL=anthropic.claude-opus-4-20250514-v1:0
python server_app.py

Backend runs on http://localhost:8000

Frontend

cd ui
npm install
npm start

Frontend runs on http://localhost:3000

Create ui/.env:

REACT_APP_API_URL=http://localhost:8000

Running The Eval Harness

From the server directory:

CODEBASE_RAG_API_URL=http://localhost:8000 \
CODEBASE_RAG_SESSION_ID=<session-id> \
CODEBASE_RAG_REPO_ID=<repo-id> \
CODEBASE_RAG_EVAL_OUTPUT=evals/latest_eval_report.json \
python evals/run_eval.py

The output report includes:

eval-set audit warnings
headline metrics
category breakdowns
case-by-case detail
a summary string suitable for project reporting

If you want to save the latest run as a JSON artifact:

CODEBASE_RAG_EVAL_OUTPUT=evals/latest_eval_report.json

Repository Structure

server/
  server_app.py
  evals/
  src/
ui/
  src/
README.md

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
images		images
server		server
ui		ui
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Code Compass

Why This Project Matters

What The System Does

App Screens

Architecture

Frontend

Backend

Retrieval Pipeline

Data Flow

Indexing Flow

Query Flow

Tech Stack Decisions

Why FastAPI

Why React

Why tree-sitter

Why Hybrid Retrieval

Why Chroma

Why In-Memory Session State

Runtime Environments

Local Development

Evaluation

Production Deployment

Deployment

Production Topology

Docker

CI/CD

Evaluation And Benchmarking

Benchmark Snapshot

Project Strengths

Known Tradeoffs

Local Setup

Backend

Frontend

Running The Eval Harness

Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages