Financial RAG API

Hybrid BM25 + Semantic retrieval pipeline over financial filings, with citation-aware generation and production observability.

Live Demo

Resource	URL
Streamlit Ui	https://huggingface.co/spaces/Spectraa28/financial-rag-api

Results

Metric	Value
RAGAs Faithfulness	0.82
Context Recall	0.0 → 1.0 (post-Docling fix)
Cold Start Latency	46s → 3s (persistent ChromaDB)
Total Chunks	453+
Retrieval Alpha (BM25:Dense)	0.3 : 0.7

Evaluated on 30 manually curated financial QA pairs across Apple and Microsoft 10-K filings. Cold start fixed by pre-embedding documents and persisting ChromaDB — ingestion pipeline skips on subsequent startups.

Architecture

User Query
    │
    ▼
Query Expansion (vocabulary bridging for financial terms)
    │
    ▼
Hybrid Retrieval
    ├── BM25 (sparse, keyword precision)     weight: 0.3
    └── MiniLM Dense Search (semantic)       weight: 0.7
    │
    ▼
Score Fusion → Top-3 Chunks with Citations
    │
    ▼
Citation-Aware Prompt Engineering
    │
    ▼
Gemini 2.5 Flash Generation (temperature=0.0)
    │
    ▼
Answer + Source Citations + Latency Telemetry

Why Financial Documents

Financial PDFs are one of the hardest RAG targets — tables with misaligned headers, XBRL-encoded values, multi-year comparative data, and dense numerical prose. This project was built specifically to handle these challenges:

IBM Docling for layout-aware parsing — preserves table structure and section hierarchy
Query expansion to bridge vocabulary gap between natural language and financial terminology
BM25 weighted at 0.3 to preserve exact keyword matching for ticker symbols, line items, and financial metrics
Citation path tracking through document heading hierarchy for auditability

Tech Stack

Component	Technology
Document Parsing	IBM Docling + HybridChunker
Embeddings	all-MiniLM-L6-v2 (sentence-transformers)
Vector Store	ChromaDB (persistent, cosine similarity)
Sparse Retrieval	BM25Okapi (rank-bm25)
LLM	Google Gemini 2.5 Flash
API Framework	FastAPI + Uvicorn
UI	Gradio
Observability	MLflow + Prometheus
Deployment	HuggingFace Spaces

Project Structure

financial-rag-api/
├── app.py           → Gradio UI entry point
├── ingestion.py     → Docling parsing, chunking, ChromaDB storage
├── retrieval.py     → Hybrid search, query expansion, citation formatting
├── pipeline.py      → Generation, MLflow tracking, Prometheus metrics
├── main.py          → FastAPI app with /query, /health, /metrics endpoints
├── chroma_db/       → Pre-embedded persistent vector store
└── requirements.txt

Running Locally

# Clone the repo
git clone https://github.com/Spectraa28/financial-rag-api
cd financial-rag-api

# Install dependencies
pip install -r requirements.txt

# Set environment variables
cp .env.example .env
# Add your GEMINI_API_KEY to .env

# Run Gradio UI
python app.py

# Or run FastAPI server
uvicorn main:app --host 0.0.0.0 --port 8000

App runs on http://localhost:7860 (Gradio) or http://localhost:8000 (FastAPI)

Demo Questions

Simple retrieval:

What was Apple's total revenue in FY2023?
What is Apple's cash position as of FY2023?
What are Apple's primary business segments?

Complex analysis:

Compare Apple and Microsoft's operating margins in FY2023
What risks did Apple identify in their 2023 annual report?
How did Apple's R&D expenses trend in FY2023?

Cross-company (Phase 1 complete):

Which company had higher net income in FY2023 — Apple or Microsoft?
Compare the debt structures of Apple and Microsoft in FY2023

What I Learned Building This

The biggest bottleneck wasn't retrieval quality — it was ingestion quality. My first implementation used manual BeautifulSoup table parsing. Tables had column misalignment, garbage values from XBRL spacer cells, and broken row boundaries. Context Recall was 0.0 because the embedded text was structurally corrupted before it ever reached the vector store.

Switching to IBM Docling fixed this entirely — it understands document layout, preserves table structure, and tracks heading hierarchy for citation paths. Context Recall jumped from 0.0 to 1.0.

The hybrid search alpha of 0.7 dense / 0.3 sparse came from iterative testing. Pure semantic search missed exact financial terms like specific line items and ticker symbols. Pure BM25 missed semantically equivalent phrasings. 0.7/0.3 gave the best balance across both simple lookup and complex analytical queries.

Evaluation

Evaluated using RAGAs on 30 manually curated QA pairs:

Metric	Score
Faithfulness	0.82
Context Recall	1.0
Context Precision	0.27
Answer Relevancy	in progress

Context Precision of 0.27 reflects a known limitation — fixed top-k retrieval pulls in loosely related chunks alongside the relevant ones. The natural fix is a cross-encoder reranker, which is planned for Phase 2.

Roadmap

✅ Phase 1 — Core RAG Pipeline (Complete)

IBM Docling ingestion with layout-aware chunking
Hybrid BM25 + dense retrieval (alpha=0.7)
Query expansion for financial vocabulary
Citation-aware prompt engineering
MLflow + Prometheus observability
Apple and Microsoft 10-K FY2023 coverage
Gradio UI + FastAPI deployment

🔄 Phase 2 — Agentic Research Assistant (In Progress)

PDF upload endpoint — ingest any 10-K dynamically
Query decomposition agent — LLM breaks complex cross-company questions into per-company sub-queries
Cross-encoder reranker — second pass on top-k results to fix context precision
Cross-company synthesizer — unified answer with citations across multiple filings
Company registry — stateful tracking of loaded documents
Async ingestion with status polling

Author

Built by Sonu Verma

Part of a 126-day self-directed ML Engineering program. Building in public.

---

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
api		api
app		app
monitoring		monitoring
src		src
.env.example		.env.example
.gitignore		.gitignore
10-K 2023.pdf		10-K 2023.pdf
10-K microsoft.pdf		10-K microsoft.pdf
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial RAG API

Live Demo

Results

Architecture

Why Financial Documents

Tech Stack

Project Structure

Running Locally

Demo Questions

What I Learned Building This

Evaluation

Roadmap

✅ Phase 1 — Core RAG Pipeline (Complete)

🔄 Phase 2 — Agentic Research Assistant (In Progress)

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Financial RAG API

Live Demo

Results

Architecture

Why Financial Documents

Tech Stack

Project Structure

Running Locally

Demo Questions

What I Learned Building This

Evaluation

Roadmap

✅ Phase 1 — Core RAG Pipeline (Complete)

🔄 Phase 2 — Agentic Research Assistant (In Progress)

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages