Skip to content

Spectraa28/Financial-Rag

Repository files navigation

Financial RAG API

Hybrid BM25 + Semantic retrieval pipeline over financial filings, with citation-aware generation and production observability.

Status Python FastAPI ChromaDB License


Live Demo

Resource URL
Streamlit Ui https://huggingface.co/spaces/Spectraa28/financial-rag-api

Results

Metric Value
RAGAs Faithfulness 0.82
Context Recall 0.0 → 1.0 (post-Docling fix)
Cold Start Latency 46s → 3s (persistent ChromaDB)
Total Chunks 453+
Retrieval Alpha (BM25:Dense) 0.3 : 0.7

Evaluated on 30 manually curated financial QA pairs across Apple and Microsoft 10-K filings. Cold start fixed by pre-embedding documents and persisting ChromaDB — ingestion pipeline skips on subsequent startups.


Architecture

User Query
    │
    ▼
Query Expansion (vocabulary bridging for financial terms)
    │
    ▼
Hybrid Retrieval
    ├── BM25 (sparse, keyword precision)     weight: 0.3
    └── MiniLM Dense Search (semantic)       weight: 0.7
    │
    ▼
Score Fusion → Top-3 Chunks with Citations
    │
    ▼
Citation-Aware Prompt Engineering
    │
    ▼
Gemini 2.5 Flash Generation (temperature=0.0)
    │
    ▼
Answer + Source Citations + Latency Telemetry

Why Financial Documents

Financial PDFs are one of the hardest RAG targets — tables with misaligned headers, XBRL-encoded values, multi-year comparative data, and dense numerical prose. This project was built specifically to handle these challenges:

  • IBM Docling for layout-aware parsing — preserves table structure and section hierarchy
  • Query expansion to bridge vocabulary gap between natural language and financial terminology
  • BM25 weighted at 0.3 to preserve exact keyword matching for ticker symbols, line items, and financial metrics
  • Citation path tracking through document heading hierarchy for auditability

Tech Stack

Component Technology
Document Parsing IBM Docling + HybridChunker
Embeddings all-MiniLM-L6-v2 (sentence-transformers)
Vector Store ChromaDB (persistent, cosine similarity)
Sparse Retrieval BM25Okapi (rank-bm25)
LLM Google Gemini 2.5 Flash
API Framework FastAPI + Uvicorn
UI Gradio
Observability MLflow + Prometheus
Deployment HuggingFace Spaces

Project Structure

financial-rag-api/
├── app.py           → Gradio UI entry point
├── ingestion.py     → Docling parsing, chunking, ChromaDB storage
├── retrieval.py     → Hybrid search, query expansion, citation formatting
├── pipeline.py      → Generation, MLflow tracking, Prometheus metrics
├── main.py          → FastAPI app with /query, /health, /metrics endpoints
├── chroma_db/       → Pre-embedded persistent vector store
└── requirements.txt

Running Locally

# Clone the repo
git clone https://github.com/Spectraa28/financial-rag-api
cd financial-rag-api

# Install dependencies
pip install -r requirements.txt

# Set environment variables
cp .env.example .env
# Add your GEMINI_API_KEY to .env

# Run Gradio UI
python app.py

# Or run FastAPI server
uvicorn main:app --host 0.0.0.0 --port 8000

App runs on http://localhost:7860 (Gradio) or http://localhost:8000 (FastAPI)


Demo Questions

Simple retrieval:

What was Apple's total revenue in FY2023?
What is Apple's cash position as of FY2023?
What are Apple's primary business segments?

Complex analysis:

Compare Apple and Microsoft's operating margins in FY2023
What risks did Apple identify in their 2023 annual report?
How did Apple's R&D expenses trend in FY2023?

Cross-company (Phase 1 complete):

Which company had higher net income in FY2023 — Apple or Microsoft?
Compare the debt structures of Apple and Microsoft in FY2023

What I Learned Building This

The biggest bottleneck wasn't retrieval quality — it was ingestion quality. My first implementation used manual BeautifulSoup table parsing. Tables had column misalignment, garbage values from XBRL spacer cells, and broken row boundaries. Context Recall was 0.0 because the embedded text was structurally corrupted before it ever reached the vector store.

Switching to IBM Docling fixed this entirely — it understands document layout, preserves table structure, and tracks heading hierarchy for citation paths. Context Recall jumped from 0.0 to 1.0.

The hybrid search alpha of 0.7 dense / 0.3 sparse came from iterative testing. Pure semantic search missed exact financial terms like specific line items and ticker symbols. Pure BM25 missed semantically equivalent phrasings. 0.7/0.3 gave the best balance across both simple lookup and complex analytical queries.


Evaluation

Evaluated using RAGAs on 30 manually curated QA pairs:

Metric Score
Faithfulness 0.82
Context Recall 1.0
Context Precision 0.27
Answer Relevancy in progress

Context Precision of 0.27 reflects a known limitation — fixed top-k retrieval pulls in loosely related chunks alongside the relevant ones. The natural fix is a cross-encoder reranker, which is planned for Phase 2.


Roadmap

✅ Phase 1 — Core RAG Pipeline (Complete)

  • IBM Docling ingestion with layout-aware chunking
  • Hybrid BM25 + dense retrieval (alpha=0.7)
  • Query expansion for financial vocabulary
  • Citation-aware prompt engineering
  • MLflow + Prometheus observability
  • Apple and Microsoft 10-K FY2023 coverage
  • Gradio UI + FastAPI deployment

🔄 Phase 2 — Agentic Research Assistant (In Progress)

  • PDF upload endpoint — ingest any 10-K dynamically
  • Query decomposition agent — LLM breaks complex cross-company questions into per-company sub-queries
  • Cross-encoder reranker — second pass on top-k results to fix context precision
  • Cross-company synthesizer — unified answer with citations across multiple filings
  • Company registry — stateful tracking of loaded documents
  • Async ingestion with status polling

Author

Built by Sonu Verma

Part of a 126-day self-directed ML Engineering program. Building in public.


---

About

Production RAG API for financial document Q&A — hybrid BM25 + MiniLM retrieval, Docling ingestion, RAGAs evaluation, FastAPI + Docker. Built on Apple 10-K FY2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors