Invoice Bounding Box Detection & AI Chat System

A complete Python-based Invoice Processing System that accepts invoices (PDF, DOCX, PNG, JPG, JPEG), converts documents to layout-preserved image frames, runs OCR (PaddleOCR / EasyOCR / PyTesseract) with adaptive fallbacks, and morphologically extracts tables & cells using OpenCV.

New in this release: The system now features a Dual-Engine Extraction pipeline that combines rule-based heuristics with AI gap-filling, an intelligent categorizer, and a context-aware conversational AI chatbot with memory. Results are displayed in a premium, responsive Streamlit web interface that enables interactive field conflict resolution, chat functionality, and data exports.

Key Features

Multi-Format Ingestion:
- PDF support with Poppler and PyMuPDF fallback (pure Python, zero binary dependency).
- DOCX support with dynamic text and table renderer to PIL canvas.
- Standard Image support (PNG, JPG, JPEG).
Unified OCR Wrapper:
- Select between PaddleOCR, EasyOCR, or PyTesseract.
- Automatic cascade fallback (PaddleOCR ➔ EasyOCR ➔ PyTesseract) to ensure the system runs immediately on any environment.
Dual-Engine Extraction & Conflict Resolution:
- Rule-Based: Heuristic, Regex, & Proximity Field Matchers detect fields (Invoice Number, Dates, GSTIN, Amounts) and map tables using OpenCV morph kernels.
- AI-Based: Uses Large Language Models (LLMs) to independently extract fields, automatically filling gaps missed by the rule engine.
- Resolution Panel: Side-by-side UI to compare rule-based vs. AI-extracted values and manually resolve discrepancies.

Context-Aware AI Chatbot:
- Dedicated WhatsApp-style chat panel to interrogate your invoices.
- Automatically injects document metadata, extracted fields, table grid text, and raw OCR text into the LLM system prompt for granular awareness.
- Maintains conversation history (up to 20 messages).
Intelligent Categorization & Risk Management:
- Classifies invoices into 9 business categories and assigns expense tags using keyword strategies or LLM wrappers.
- Risk Indicators: Warns about missing critical fields (e.g., due dates) and flags potential duplicate invoices.
Multi-Provider LLM Wrapper & Database Backing:
- Seamlessly plug in OpenAI (gpt-4o-mini), Google Gemini (gemini-1.5-flash), or Anthropic (claude-3-haiku).
- Persistent storage using SQLAlchemy (defaults to SQLite, supports PostgreSQL).

Technology Stack

Python 3.11+
Streamlit (Web Application Interface & Chat UI)
OpenAI / Google Gemini / Anthropic APIs (LLM Integrations)
SQLAlchemy (Database ORM)
OpenCV (Morphological analysis and line extraction)
NumPy & Pandas (Data structures and tables)
Pillow (Image operations and alpha compositing)
python-docx & pdf2image & PyMuPDF (fitz) (Document parsing/rendering)
pytesseract, easyocr, paddleocr (OCR execution options)

Directory Structure

Invoice/
├── app.py                      # Core Streamlit Web Application Dashboard
├── requirements.txt            # Python Package Dependencies
├── config.py                   # Centralized Configuration (colors, regex, keywords)
├── .env                        # API Keys and Environment Variables
├── README.md                   # Installation & Setup documentation
├── modules/
│   ├── converter.py            # PDF/DOCX to Image Conversion Pipeline
│   ├── ocr_engine.py           # Unified OCR Engine Wrapper with auto-fallbacks
│   ├── field_detector.py       # Heuristic, Regex & Keyword Field Detection
│   ├── table_detector.py       # OpenCV Table Grid & Cell Mapping
│   └── bbox_drawer.py          # Bounding Box Overlays & Labels Canvas Drawer
├── services/
│   ├── invoice_service.py      # Core CRUD and Ingestion pipeline logic
│   ├── llm_service.py          # Multi-provider LLM API router (OpenAI/Gemini/Anthropic)
│   ├── categorization_service.py # Strategy pattern for auto-tagging
│   └── chat_service.py         # Context injection and conversation history manager
├── ui/
│   ├── sidebar.py              # File uploads, filters, and history lists
│   ├── invoice_viewer.py       # Document preview, conflict resolution, and data tabs
│   └── chat_panel.py           # Conversational AI interface
├── database/                   # SQLAlchemy models and schemas
├── tests/                      # Comprehensive unittest test suite
├── storage/                    # Persistent storage (invoices, annotations)
├── uploads/ & outputs/         # Temporary processing directories
└── data/                       # SQLite database location

Setup & Installation

1. Prerequisites (External Binaries)

For PDF parsing and Tesseract OCR to run correctly, install the system binaries:

macOS (via Homebrew)

# Install Poppler (for pdf2image)
brew install poppler

# Install Tesseract (for pytesseract fallback)
brew install tesseract

Ubuntu/Debian

# Install Poppler & Tesseract
sudo apt-get update
sudo apt-get install -y poppler-utils tesseract-ocr libtesseract-dev

Windows

Poppler: Download poppler for Windows (e.g. from github/poppler-windows), extract it, and add the bin/ folder to your system PATH.
Tesseract: Download the installer from UB Mannheim Tesseract, run installation, and add C:\Program Files\Tesseract-OCR to your system PATH.

2. Python Virtual Environment & Packages

# Clone or navigate to the project directory
cd Invoice

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows (Command Prompt):
# venv\Scripts\activate.bat

# Upgrade pip
pip install --upgrade pip

# Install dependencies
pip install -r requirements.txt

Note: If you wish to use PaddleOCR, install paddlepaddle and paddleocr manually if they compile on your platform:

# Install PaddlePaddle CPU (or GPU version)
pip install paddlepaddle
# Install PaddleOCR
pip install paddleocr

If PaddleOCR fails to install due to compilation issues on macOS Apple Silicon, the system will seamlessly run EasyOCR or PyTesseract instead.

3. API Key Configuration

To enable the AI capabilities (Chatbot, LLM Extraction, Smart Categorization), configure your API keys.

Create a .env file in the root directory:

cp .env.example .env

Open .env and add your API key for at least one of the supported providers:

OPENAI_API_KEY=YOUR_OPENAI_KEY
GOOGLE_API_KEY=YOUR_GEMINI_KEY
ANTHROPIC_API_KEY=YOUR_CLAUDE_KEY

(The system will automatically detect which keys are available and use the corresponding models).

Running the Application

1. Run the Test Suite

Ensure all modules compile and pass their unit assertions:

python -m unittest tests/test_pipeline.py

2. Launch the Streamlit Dashboard

streamlit run app.py

Open http://localhost:8501 in your browser. Upload a document from the sidebar to process it through the dual-engine pipeline, resolve fields, and start chatting with your invoice!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
modules		modules
samples		samples
temp		temp
tests		tests
README.md		README.md
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Invoice Bounding Box Detection & AI Chat System

Key Features

Technology Stack

Directory Structure

Setup & Installation

1. Prerequisites (External Binaries)

macOS (via Homebrew)

Ubuntu/Debian

Windows

2. Python Virtual Environment & Packages

3. API Key Configuration

Running the Application

1. Run the Test Suite

2. Launch the Streamlit Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Invoice Bounding Box Detection & AI Chat System

Key Features

Technology Stack

Directory Structure

Setup & Installation

1. Prerequisites (External Binaries)

macOS (via Homebrew)

Ubuntu/Debian

Windows

2. Python Virtual Environment & Packages

3. API Key Configuration

Running the Application

1. Run the Test Suite

2. Launch the Streamlit Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages