Multi-Modal Document Data Extraction: Invoice-to-JSON Processor
An AI-powered pipeline that extracts structured data from PDF and image-based invoices, validates correctness, auto-corrects errors, and outputs clean JSON — without per-template configuration.
Upload any invoice (PDF or image) → get structured JSON with all fields extracted, validated, and corrected automatically.
Invoice (PDF/Image) → OCR → Extraction → Validation → Auto-Correction → JSON Output
- OCR Agent — Extracts text from PDFs (PyMuPDF) and images (Tesseract), with GPT-4o vision fallback
- Extraction Agent — LLM-powered structured data extraction (11 invoice fields)
- Validation Agent — Rule-based checks (arithmetic, required fields, format)
- Correction Agent — Auto-fixes errors via rules + LLM re-analysis (max 2 retries)
- JSON Formatter — Standardized output with metadata (confidence, timing, corrections)
- LangGraph Orchestrator — Stateful pipeline with conditional correction loop
- REST API — FastAPI with Swagger UI documentation
- Web UI — React upload + status tracker + JSON viewer + download
| Layer | Technology |
|---|---|
| Backend | Python 3.11, FastAPI, LangGraph, langchain-openai |
| LLM | OpenAI GPT-4o (structured output, temperature 0) |
| OCR | PyMuPDF + Tesseract + GPT-4o Vision (fallback) |
| Frontend | React 18, TypeScript 5, Tailwind CSS, Vite |
| Infrastructure | Docker Compose |
- Docker + Docker Compose
- OpenAI API key (GPT-4o access)
# 1. Clone the repo
git clone <repo-url>
cd SmartInvoiceEngine
# 2. Set up environment
cp backend/.env.example backend/.env
# Edit backend/.env and add your OPENAI_API_KEY
# 3. Start everything
docker-compose up --build
# 4. Open the app
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# Swagger UI: http://localhost:8000/docs# Backend
cd backend
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# Frontend (separate terminal)
cd frontend
npm install
npm run dev| Method | Path | Description |
|---|---|---|
| POST | /api/process |
Upload invoice file (PDF/PNG/JPEG) |
| GET | /api/status/{job_id} |
Poll processing status |
| GET | /api/result/{job_id} |
Get extraction result (JSON) |
| GET | /api/health |
Health check |
| GET | /docs |
Swagger UI (interactive API docs) |
SmartInvoiceEngine/
├── backend/app/
│ ├── main.py # FastAPI entry point
│ ├── config.py # Environment configuration
│ ├── api/ # REST endpoints + schemas
│ ├── agents/ # OCR, Extraction, Validation, Correction, Formatter
│ ├── orchestrator/ # LangGraph workflow + state
│ ├── models/ # Pydantic domain models
│ ├── prompts/ # LLM prompt templates
│ └── utils/ # Logging utilities
├── frontend/src/
│ ├── components/ # Upload, Status, Result, Download
│ ├── services/ # API client (Axios)
│ ├── types/ # TypeScript interfaces
│ └── hooks/ # Processing hook
├── test-invoices/ # Sample invoices for testing
├── docs/ # BRD, architecture diagram, plan
└── docker-compose.yml
See docs/architecture.mmd for the full Mermaid diagram.
┌────────┐ ┌─────────┐ ┌───────────────────────────────────────┐
│ User │────▶│ React UI│────▶│ FastAPI + LangGraph Orchestrator │
└────────┘ └─────────┘ │ │
│ OCR → Extraction → Validation │
│ │ │
│ pass/fail │
│ ↓ ↓ │
│ Formatter Correction │
│ (loop back) │
└───────────────────────────────────────┘
│ │
▼ ▼
Tesseract/PyMuPDF OpenAI GPT-4o
| Phase | Status |
|---|---|
| Phase 0: Planning & Scaffolding | ⬜ In Progress |
| Phase 1: Core Agents | ⬜ Not Started |
| Phase 2: Correction + Orchestrator | ⬜ Not Started |
| Phase 3: API + Frontend | ⬜ Not Started |
| Phase 4: Integration Testing | ⬜ Not Started |
| Document | Description |
|---|---|
| REQUIREMENTS.md | Frozen scope contract |
| SPEC.md | Technical specification (source of truth) |
| docs/PLAN.md | 2-week execution plan |
| DEPENDENCIES.md | Build order graph |
| CHECKPOINTS.md | Phase gate criteria |
| DELIVERABLES.md | Demo checklist |
| FUTURE_VISION.md | Full product vision |
| MVP_PREVIEW.md | What the demo looks like |
| docs/BRD.md | Business requirements |
| docs/architecture.mmd | Architecture diagram |
Internal project — not for public distribution.