Skip to content

Dhanush-Raj1/FullStack-RAG-Application-Project

Repository files navigation

🚀 Full Stack RAG Application

End-to-End RAG Application: "Intelligent Document Question-Answering System"

Production-Ready Retrieval-Augmented Generation with FastAPI


🚀 Live Application

🌐 The application is deployed and live

👉 Access the web app here

Note

The initial load of the web app may take 1-2 minutes. Once loaded, refresh the page to ensure all features work correctly.

Tip

For the best experience, please refer to the Usage Guide section below to learn how to navigate and use the web app effectively.


📌 Overview

This project is a full-stack Retrieval-Augmented Generation (RAG) application that enables users to upload and query documents, retrieve semantically relevant context, and generate grounded answers using Large Language Models. Built with FastAPI, PostgreSQL + pgvector, FAISS, Google Gemini Embeddings, Cohere Reranker, and Groq & Gemini LLMs.

The application combines Semantic Search, Vector Databases, Cross-Encoder Reranking, LLM-Based Answer Generation, and a Modern React Frontend to deliver accurate and explainable answers from custom document collections.


🎯 Project Overview

1. Multi-Format Document Support

  • Supports ingestion of PDF, Markdown, and Plain Text documents
  • Each document type has a dedicated cleaning pipeline for high-quality preprocessing

2. Intelligent Document Preprocessing

  • PDF Cleaning: Removes headers/footers, website artifacts, OCR-style word fragmentation, image placeholders, and normalizes whitespace
  • Markdown Cleaning: Removes navigation sections and Mermaid diagrams, converts markdown links to clean text, preserves document hierarchy, and repairs broken tables
  • Text Cleaning: Removes citation markers, repairs paragraph flow, normalizes whitespace, and handles formatting artifacts

3. Metadata-Aware Chunking

  • Implemented markdown header-aware splitting with recursive character chunking
  • Configurable chunk size and overlap with section preservation
  • Deterministic chunk IDs with rich metadata stored per chunk including source file, file type, page number, section name, chunk index, and parent document ID

4. Embedding Generation

  • Leveraged Google Gemini Embeddings (gemini-embedding-001) for vector embeddings
  • Supports document and query embeddings with batch processing
  • Includes free-tier rate-limit protection with automatic throttling

5. Vector Storage with PostgreSQL + pgvector

  • Persistent knowledge base backed by PostgreSQL with the pgvector extension
  • Stores chunk text, embeddings, and source metadata
  • Supports similarity search, metadata filtering, and persistent storage

6. Advanced Retrieval Pipeline

  • Two-stage retrieval: pgvector similarity search followed by Cohere Cross-Encoder Reranking (rerank-v3.5)
  • Reranking improves precision, reduces irrelevant chunks, and raises overall answer quality

7. LLM Answer Generation

  • Integrated Llama 3.3 70B Versatile via Groq for fast, grounded answer generation
  • The model only answers from retrieved context, cites sources, reduces hallucinations, and returns explainable responses
  • Separate temperature settings for conversational replies vs. retrieval-grounded answers

8. Intent-Aware Query Routing

  • Every user query is classified as CONVERSATIONAL or RETRIEVAL before any processing
  • Conversational queries (greetings, capability questions, small talk) are handled directly by the LLM — no retrieval triggered
  • Retrieval queries are routed through the full RAG pipeline
  • Uses Gemini Flash (gemini-2.0-flash-lite) as a lightweight, low-latency classifier — preserving Groq token quota for generation

9. Conversation Memory

  • Implements sliding window memory storing the last 10 messages (user + assistant) per session
  • Memory is scoped per session ID — each browser tab maintains isolated conversation history
  • History is injected into both conversational and retrieval responses, enabling natural follow-up questions
  • Supports coreference resolution — vague queries like "what does it mean?" or "tell me more" are rewritten into self-contained search queries before retrieval

10. Session-Based Document Chat

  • Users can upload documents at runtime; these are chunked, embedded, and indexed in FAISS without modifying the global database
  • Provides temporary workspaces with fast retrieval and session isolation

11. Modern React Frontend

  • Built with React, TypeScript, Vite, and Tailwind CSS
  • Features a chat interface, source chunk viewer, session uploads, responsive design, and real-time API integration

🚀 Features

  • Multi-Format Ingestion: Upload and query PDF, Markdown, and Text documents seamlessly
  • Intelligent Preprocessing: Dedicated cleaning pipelines per document type for high-quality chunking
  • Semantic Search: Dense vector retrieval using Google Gemini Embeddings and pgvector
  • Cross-Encoder Reranking: Uses Cohere rerank-v3.5 to improve retrieval precision
  • Grounded LLM Answers: Llama 3.3 70B via Groq answers only from retrieved context, reducing hallucinations
  • Session-Based Chat: Upload documents at runtime, indexed in FAISS without touching the global database
  • Intent-Aware Routing: Classifies every query as conversational or retrieval — greetings and small talk never trigger unnecessary vector search
  • Conversation Memory: Sliding window memory per session enables natural multi-turn conversations and follow-up questions
  • Query Rewriting: Vague coreference queries are automatically rewritten into precise search queries using conversation history
  • Source Transparency: Every answer includes source chunks so users can verify the retrieved context
  • Modern Frontend: Responsive chat UI built with React, TypeScript, and Tailwind CSS

🏗️ System Architecture

                    ┌──────────────────┐
                    │ React Frontend   │
                    └────────┬─────────┘
                             │
                             ▼
                    ┌──────────────────┐
                    │ FastAPI Backend  │
                    └────────┬─────────┘
                             │
                    ┌────────▼─────────┐
                    │  Query Router    │  ← Gemini Flash (intent classification)
                    └────────┬─────────┘
                             │
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
     CONVERSATIONAL                     RETRIEVAL
              │                             │
              │                    ┌────────▼────────┐
              │                    │  Query Rewriter │  ← resolves coreferences
              │                    └────────┬────────┘
              │                             │
              │          ┌──────────────────┼──────────────────┐
              │          │                                     │
              │          ▼                                     ▼
              │  ┌───────────────┐                 ┌─────────────────┐
              │  │  Global RAG   │                 │   Session RAG   │
              │  │   pgvector    │                 │     FAISS       │
              │  │     Neon      │                 │ In-Memory Index │
              │  └───────┬───────┘                 └────────┬────────┘
              │          │                                   │
              │          ▼                                   ▼
              │    Similarity Search               Similarity Search
              │          │                                   │
              │          ▼                                   │
              │   Cohere Reranker                            │
              │          │                                   │
              └──────────┴───────────────────────────────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │  Groq LLM Generator │  ← history + context injected
              │  (+ Memory/History) │
              └─────────┬───────────┘
                        │
                        ▼
                   Final Answer

🏗️ Tech Stack

  • Python
  • PyMuPDF4LLM + LangChain Text Splitters (Document processing)
  • Google Gemini Embeddings (gemini-embedding-001)
  • PostgreSQL + pgvector (Neon) (Persistent vector database)
  • FAISS (In-memory vector index for session-based retrieval)
  • Cohere Rerank (rerank-v3.5 for cross-encoder reranking)
  • Gemini Flash (gemini-2.0-flash-lite) (Intent classification / query routing)
  • Sliding Window Memory (Per-session conversation history via in-memory registry)
  • Groq API (Accessing Llama 3.3 70B Versatile)
  • FastAPI (Backend API framework with Pydantic & Psycopg)
  • React + TypeScript + Vite (Modern frontend)
  • Tailwind CSS (Frontend styling)

📂 Project Structure

fullstack-rag-application
│
├── documents/                        # Source documents used for ingestion
│   ├── markdown/                     # Markdown files (NemoClaw documentation)
│   ├── pdfs/                         # PDF files (Apple product tech specs)
│   └── text/                         # Plain text files (Space exploration articles)
│
├── frontend/                         # React + TypeScript frontend application
│   └── src/
│       ├── components/               # Reusable UI components (chat, input, source chunks)
│       ├── types/                    # TypeScript type definitions
│       ├── api.ts                    # API calls to the FastAPI backend
│       ├── App.tsx                   # Root application component
│       └── main.tsx                  # Application entry point
│
├── notebooks/                        # Jupyter notebooks for experiments and pipeline testing
│
├── src/                              # Core backend source code
│   ├── core/                         # Main RAG pipeline modules
│   │   ├── chunker.py                # Document chunking logic
│   │   ├── embedding.py              # Gemini embedding generation
│   │   ├── reranker.py               # Cohere cross-encoder reranking
│   │   ├── retriever.py              # Vector similarity retrieval
│   │   └── vector_store.py           # pgvector and FAISS store management
│   │   ├── query_router.py           # Intent classifier (CONVERSATIONAL vs RETRIEVAL)
│   │   ├── memory.py                 # Sliding window conversation memory + session registry
│   │   ├── llama_generator.py        # Groq LLM answer generation
│   ├── loaders/                      # Document loaders for each file type (pdf, md, txt)
│   ├── preprocess/                   # Document cleaners for each file type (pdf, md, txt)
│   ├── models/                       # Data models schema
│   └── utils/                        # Config and path utility helpers
│
├── app.py                            # FastAPI application and route definitions
├── ingest.py                         # Document ingestion pipeline (load → clean → chunk → embed → store)
├── session_pipeline.py               # Session-based FAISS pipeline for runtime document uploads
├── golden_QA.md                      # Golden Q&A dataset for evaluation and benchmarking
├── requirements.txt                  # Python dependencies
└── pyproject.toml                    # Project metadata and build configuration

🚀 Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/yourusername/fullstack-rag-application.git
cd fullstack-rag-application

2️⃣ Create a Virtual Environment

conda create -p env python=3.11 -y
conda activate env

or

python -m venv env

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Set Up Environment Variables

Create a .env file in the root directory and add:

GEMINI_API_KEY=xxxxxxxxxxxx
GROQ_API_KEY=xxxxxxxxxxxx
COHERE_API_KEY=xxxxxxxxxxxx
DATABASE_URL=postgresql://user:password@localhost:5432/rag_db

5️⃣ Ingest Documents

python ingest.py

6️⃣ Run the Backend

uvicorn app:app --reload

Backend available at http://localhost:8000, Swagger docs at http://localhost:8000/docs

7️⃣ Run the Frontend

cd frontend
npm install
npm run dev

Frontend available at http://localhost:5173


🌐 Usage Guide

👉 Access the web app

  • Global Knowledge Base Chat: Ask questions about any pre-ingested documents
    • "What is NemoClaw?"
    • "Summarize the key points from the research paper."
  • Session Document Upload: Upload your own PDF, Markdown, or Text file and chat with it
    • "Summarize the uploaded document."
    • "What are the main conclusions?"
  • Source Verification: Every answer displays the retrieved source chunks so you can verify context
  • API Access:
    • Global chat: POST /api/chat/global with x-session-id header
    • Session chat: POST /api/chat/session with x-session-id header
    • Upload: POST /api/upload with x-session-id header

🧪 Evaluation Framework

📂 Full Q&A pairs with expected answers are available here → golden_QA.md

A Golden Q&A dataset is included to benchmark and validate the RAG pipeline's retrieval and answer quality across all three supported document types — Markdown, Text, and PDF.

The dataset covers 30 primary evaluation pairs and 5 documented failed cases, making it suitable for both pass/fail testing and analysis.


📊 Pipelines

Ingestion Pipeline

Documents (PDF, Markdown, TXT)
    │
    ▼
Document Loader
    │
    ▼
Preprocessing & Cleaning
    │
    ▼
Metadata-Aware Chunking
    │
    ▼
Gemini Embeddings (gemini-embedding-001)
    │
    ▼
pgvector Store (Neon)

Query Pipeline

User Query
    │
    ▼
Intent Classifier (Gemini Flash)
    │
    ├──── CONVERSATIONAL ────────────────────────────┐
    │                                                │
    └──── RETRIEVAL                                  │
              │                                      │
              ▼                                      │
    Coreference Check                                │
    (needs rewrite?)                                 │
       │          │                                  │
      YES         NO                                 │
       │          │                                  │
       ▼          │                                  │
    Query         │                                  │
    Rewriter      │                                  │
       │          │                                  │
       └────┬─────┘                                  │
            │                                        │
            ▼                                        │
    Similarity Search                                │
    (pgvector / FAISS)                               │
            │                                        │
            ▼                                        │
    Cohere Reranking                                 │
            │                                        │
            └──────────────┬─────────────────────────┘
                           │
                           ▼
              Groq LLM — Llama 3.3 70B
              (context + memory injected)
                           │
                           ▼
                   Grounded Response

📡 API Endpoints

Global Knowledge Base Chat

POST /api/chat/global

Headers:

x-session-id: session_123

Request:

{
  "question": "What is NemoClaw?"
}

Upload Documents

POST /api/upload

Headers:

x-session-id: session_123

Session Chat

POST /api/chat/session

Headers:

x-session-id: session_123

Request:

{
  "question": "Summarize the uploaded document"
}

📸 Screenshots

Screenshot of the web application:


Screenshot of the chat interface:


🎯 Future Improvements

  • Hybrid search (BM25 + Dense Retrieval)
  • HNSW indexing
  • Persistent conversation memory across sessions (database-backed)
  • Multi-modal retrieval
  • Citation highlighting in the UI
  • Docker and Kubernetes deployment
  • LangGraph agent workflows
  • Evaluation framework integration (RAGAS)

🤝 Contributing

💡 Have an idea? Feel free to contribute or open an issue and pull requests!


📄 License

This project is licensed under the MIT LicenseLICENSE


About

Production-ready RAG system built with FastAPI and React. Upload and query documents with semantic search, cross-encoder reranking, and grounded LLM answers powered by Groq and Gemini.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors