AI-powered legal document search and analysis platform. Search across 46,456+ legal cases using semantic similarity, read AI-generated summaries, ask follow-up questions, upload confidential documents for isolated analysis, and consult a legal domain chatbot. Backed by a FastAPI backend, PostgreSQL with pgvector, Celery for asynchronous processing, and a React frontend.
Natural language search over indexed legal cases. Queries are embedded using sentence-transformers/all-mpnet-base-v2 and compared against a FAISS index using cosine similarity. Results are ranked by relevance score.
Clicking any search result opens a document analysis view. The system asynchronously processes the PDF using Celery workers: extracting text with PyMuPDF, chunking it, generating embeddings, and storing them in PostgreSQL using pgvector. Once processed, users can ask follow-up questions against the document context using a Retrieval-Augmented Generation (RAG) pipeline backed by Groq LLMs.
Users can upload their own PDFs directly from the browser. The file is saved locally, processed asynchronously by Celery, and its embeddings are stored in the database. Users can chat with their documents in persistent, stateful sessions.
A general-purpose AI assistant pre-prompted for legal domain queries. Accepts a message and conversation history, passes them through a LangChain agent backed by Groq, and returns a streamed response. Features a guardrail to reject non-legal queries.
The system relies on a decoupled architecture for performance and scalability:
- Web Server: FastAPI handles incoming HTTP requests, session management, and streaming LLM responses via Server-Sent Events (SSE).
- Asynchronous Processing: Celery workers (backed by RabbitMQ) handle heavy background tasks such as PDF text extraction, chunking, and embedding generation.
- Database: PostgreSQL stores user data, chat sessions, messages, and document metadata. The
pgvectorextension is used to store and query document embeddings natively in the database. - Frontend: React application that provides the UI for semantic search, document management, and chat interfaces.
| Component | Technology |
|---|---|
| Frontend | React 18, Vite, TailwindCSS, lucide-react |
| Backend | FastAPI, Python 3.11, SQLAlchemy, Alembic |
| Task Queue | Celery, RabbitMQ |
| Database | PostgreSQL 17, pgvector |
| LLM | Groq llama-3.3-70b-versatile via LangChain |
| Embeddings | sentence-transformers/all-mpnet-base-v2 |
| Search | FAISS (Main Corpus), pgvector (Session Documents) |
| PDF Processing | PyMuPDF, LangChain RecursiveCharacterTextSplitter |
| Deployment | Docker Compose, Nginx, Azure VM, Azure Static Web Apps |
- Docker and Docker Compose
- Node.js 18+
- Groq API key
The backend services are orchestrated using Docker Compose.
# 1. Start the database and message broker
docker-compose up db rabbitmq -d
# 2. Setup the Python environment
cd backend
python -m venv venv
# Windows: .\venv\Scripts\activate
# Linux/Mac: source venv/bin/activate
pip install -r requirements.txt
# 3. Configure environment
cp .env.example .env
# Edit .env and set your GROQ_API_KEY
# 4. Run database migrations
alembic upgrade head
# 5. Start the API server
uvicorn app.main:create_app --factory --host 0.0.0.0 --port 8000 --reload
# 6. Start the Celery worker (in a new terminal)
celery -A app.workers.celery_app worker --loglevel=infocd frontend
npm install
npm run devOpen http://localhost:5173.
JurisFind/
├── backend/
│ ├── alembic/ # Database migrations
│ ├── app/
│ │ ├── ai/ # LangChain agents (RAG, Chatbot)
│ │ ├── api/ # FastAPI routers (auth, sessions, documents)
│ │ ├── db/ # SQLAlchemy models and CRUD operations
│ │ ├── schemas/ # Pydantic models for request/response validation
│ │ ├── services/ # Core business logic and blob storage integration
│ │ └── workers/ # Celery tasks (document processing)
│ ├── data/ # Local storage for uploaded documents and FAISS index
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/ # Reusable UI components
│ │ ├── config/ # API client configuration
│ │ ├── context/ # React Context (Auth)
│ │ └── pages/ # Main application views (Search, Assistant, Login)
│ └── package.json
└── docker-compose.yml # Infrastructure definition (Postgres, RabbitMQ)
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string |
RABBITMQ_URL |
RabbitMQ connection string |
GROQ_API_KEY |
Required for LLM inference |
SECRET_KEY |
JWT signing key |
USE_LOCAL_FILES |
Set to true to use local filesystem instead of Azure Blob |
| Variable | Description |
|---|---|
VITE_API_BASE_URL |
Backend URL (defaults to http://localhost:8000) |
Detailed documentation is available in the docs/ directory:
docs/architecture.md: System architecture and data flow.docs/api_reference.md: API endpoint specifications.docs/technical_documentation.md: Comprehensive internal reference.