GitHub - ilyassuelen/InsightAI: InsightAI: Python-based document processing platform with chunking, LLM-powered reports, and structured data analysis.

> Transform PDF, CSV, DOCX and TXT files into structured AI reports and searchable knowledge.

InsightAI is an AI-powered document intelligence platform for analyzing PDF, CSV, DOCX and TXT files using Retrieval-Augmented Generation (RAG) and scalable LLM pipelines.

The system supports automated parsing, intelligent chunking, Qdrant-based vector retrieval, structured AI report generation, and workspace-wide AI chat across uploaded documents.

Current Capabilities:

Multi-user team workspaces

Role-based access control

Workspace-scoped document isolation

AI-powered structured reporting

Workspace-wide document chat

Multi-format ingestion (PDF, CSV, DOCX, TXT)

🖥️ User Interface Preview

🎬 Application Demo

Short walkthrough showing the full InsightAI UI.

Live Preview

Frontend preview: https://insightai-lyart.vercel.app/

Note: The hosted preview runs with limited free-tier backend resources.
For reliable document processing, run the project locally or deploy the backend on a production-grade instance.

⚡ Key Features

Document Upload: Supports PDF, CSV, DOCX and TXT files.
Scalable CSV Processing
Memory-safe streaming & token-aware chunking.
Successfully tested with 25,000+ row CSV files.
RAG-Based AI Reports
Structured summaries, key figures, findings, risks, and conclusions generated strictly from document evidence.
Multi-Language Report Generation Generate reports in any supported language directly from the dashboard (e.g. EN, DE, FR, ES, AR, CN etc.).
Workspace-Wide AI Chat & Retrieval: Ask questions across all uploaded workspace documents using semantic retrieval and RAG-based context generation.
Team Workspaces & Collaboration
- Personal and shared team spaces
- Role-based access (Owner / Member)
- Secure document isolation
- Member management
Workspace-Scoped Retrieval
Vector search and document retrieval are strictly isolated per workspace to ensure secure multi-user environments.
OpenAI + Gemini Fallback
Automatic fallback to Google Gemini when OpenAI hits:
- 429 rate limits
- token limits
- temporary API failures
Robust Processing Pipeline
Chunking, embedding, Qdrant-based vector storage, block structuring, reporting and LLM tracing via Langfuse for debugging and monitoring.
Modern AI Workspace Interface: Glassmorphism-inspired dashboard optimized for structured reporting, document intelligence workflows, and collaborative AI analysis.

🛠 Installation

Prerequisites

Node.js >= 18
Python >= 3.10
Git

Quick Start (macOS / Linux / Windows PowerShell):

1. Clone the repository

git clone https://github.com/ilyassuelen/InsightAI
cd InsightAI

2. Vector Database (Qdrant)

InsightAI uses Qdrant as the vector database and supports both:

local Qdrant instances
hosted Qdrant Cloud clusters

For local development, you can start Qdrant via Docker:

docker run -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/backend/storage/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

3. Start Backend

# Backend setup
cd backend
python -m venv .venv

# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt
uvicorn backend.main:app --reload

4. Start Frontend

cd ../frontend
npm install
npm run dev

⚙️ Configuration

Create a .env file in the project root:

Name	Required	Description
OPENAI_API_KEY	✅	OpenAI API key for AI report generation
GEMINI_API_KEY	❌	Optional Gemini fallback API key
DATABASE_URL	❌	PostgreSQL connection string (SQLite fallback supported)
JWT_SECRET_KEY	✅	Secret key for JWT authentication
QDRANT_URL	✅	Qdrant instance or cloud cluster URL
QDRANT_API_KEY	❌	API key for Qdrant Cloud authentication
QDRANT_COLLECTION	❌	Collection name for vector storage
LANGFUSE_PUBLIC_KEY	❌	Langfuse public key
LANGFUSE_SECRET_KEY	❌	Langfuse secret key
LANGFUSE_HOST	❌	Langfuse host URL
R2_ACCOUNT_ID	❌	Cloudflare R2 account ID
R2_ACCESS_KEY_ID	❌	Cloudflare R2 access key
R2_SECRET_ACCESS_KEY	❌	Cloudflare R2 secret access key
R2_BUCKET	❌	Cloudflare R2 bucket name
CORS_ORIGINS	❌	Allowed frontend origins for CORS

Usage

Open the frontend in your browser at http://localhost:8080.
Register or login
Select or create a workspace
Select your preferred report language in the dashboard.
Upload a document (PDF, CSV, DOCX, TXT)
Wait for AI processing (status shown in sidebar).
Click on the document to view the generated report.
Ask questions about uploaded documents in the chat

Tech Stack

Frontend

React
TypeScript
Tailwind CSS
Framer Motion

Backend

FastAPI
Python
Pydantic
SQLAlchemy

AI & Retrieval

OpenAI
Gemini (Fallback)
Retrieval-Augmented Generation (RAG)
Qdrant
Langfuse

Infrastructure & Deployment

Frontend Hosting: Vercel
Backend Deployment: Render
Database: Neon PostgreSQL
Vector Database: Qdrant Cloud
Object Storage: Cloudflare R2
Authentication: JWT

Architecture

flowchart TD

A[User Upload] --> B[FastAPI Backend]

B --> C[Document Parsing]

C --> D[Chunking]

D --> E[Embeddings]

E --> F[Qdrant Vector DB]

F --> G[RAG Retrieval]

G --> H[LLM Processing]

H --> I[Structured AI Reports]

H --> J[Workspace AI Chat]

Roadmap (Planned Features)

API-connected data ingestion
Advanced analytics & visualizations
Semantic search improvements

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a feature branch: git checkout -b feature/my-feature
Commit your changes: git commit -m 'Add some feature'
Push to the branch: git push origin feature/my-feature
Open a Pull Request

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
backend		backend
frontend		frontend
static/images		static/images
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
render.yaml		render.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖥️ User Interface Preview

🎬 Application Demo

Live Preview

⚡ Key Features

🛠 Installation

Prerequisites

Quick Start (macOS / Linux / Windows PowerShell):

1. Clone the repository

2. Vector Database (Qdrant)

3. Start Backend

4. Start Frontend

⚙️ Configuration

Usage

Tech Stack

Frontend

Backend

AI & Retrieval

Infrastructure & Deployment

Architecture

Roadmap (Planned Features)

🤝 Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🖥️ User Interface Preview

🎬 Application Demo

Live Preview

⚡ Key Features

🛠 Installation

Prerequisites

Quick Start (macOS / Linux / Windows PowerShell):

1. Clone the repository

2. Vector Database (Qdrant)

3. Start Backend

4. Start Frontend

⚙️ Configuration

Usage

Tech Stack

Frontend

Backend

AI & Retrieval

Infrastructure & Deployment

Architecture

Roadmap (Planned Features)

🤝 Contributing

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages