Skip to content

ilyassuelen/InsightAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

119 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

InsightAI Logo

> Transform PDF, CSV, DOCX and TXT files into structured AI reports and searchable knowledge.

License: MIT Python β‰₯3.10 FastAPI Vector DB AI: OpenAI + Gemini React Node β‰₯18


InsightAI is an AI-powered document intelligence platform for analyzing PDF, CSV, DOCX and TXT files using Retrieval-Augmented Generation (RAG) and scalable LLM pipelines.

The system supports automated parsing, intelligent chunking, Qdrant-based vector retrieval, structured AI report generation, and workspace-wide AI chat across uploaded documents.

Current Capabilities:

  • Multi-user team workspaces
  • Role-based access control
  • Workspace-scoped document isolation
  • AI-powered structured reporting
  • Workspace-wide document chat
  • Multi-format ingestion (PDF, CSV, DOCX, TXT)

πŸ–₯️ User Interface Preview

🎬 Application Demo

Short walkthrough showing the full InsightAI UI.

InsightAI Application Demo


Live Preview

Frontend preview: https://insightai-lyart.vercel.app/

Note: The hosted preview runs with limited free-tier backend resources.
For reliable document processing, run the project locally or deploy the backend on a production-grade instance.


⚑ Key Features

  • Document Upload: Supports PDF, CSV, DOCX and TXT files.
  • Scalable CSV Processing
    Memory-safe streaming & token-aware chunking.
    Successfully tested with 25,000+ row CSV files.
  • RAG-Based AI Reports
    Structured summaries, key figures, findings, risks, and conclusions generated strictly from document evidence.
  • Multi-Language Report Generation Generate reports in any supported language directly from the dashboard (e.g. EN, DE, FR, ES, AR, CN etc.).
  • Workspace-Wide AI Chat & Retrieval: Ask questions across all uploaded workspace documents using semantic retrieval and RAG-based context generation.
  • Team Workspaces & Collaboration
    • Personal and shared team spaces
    • Role-based access (Owner / Member)
    • Secure document isolation
    • Member management
  • Workspace-Scoped Retrieval
    Vector search and document retrieval are strictly isolated per workspace to ensure secure multi-user environments.
  • OpenAI + Gemini Fallback
    Automatic fallback to Google Gemini when OpenAI hits:
    • 429 rate limits
    • token limits
    • temporary API failures
  • Robust Processing Pipeline
    Chunking, embedding, Qdrant-based vector storage, block structuring, reporting and LLM tracing via Langfuse for debugging and monitoring.
  • Modern AI Workspace Interface: Glassmorphism-inspired dashboard optimized for structured reporting, document intelligence workflows, and collaborative AI analysis.

πŸ›  Installation

Prerequisites

  • Node.js >= 18
  • Python >= 3.10
  • Git

Quick Start (macOS / Linux / Windows PowerShell):

1. Clone the repository

git clone https://github.com/ilyassuelen/InsightAI
cd InsightAI

2. Vector Database (Qdrant)

InsightAI uses Qdrant as the vector database and supports both:

  • local Qdrant instances
  • hosted Qdrant Cloud clusters

For local development, you can start Qdrant via Docker:

docker run -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/backend/storage/qdrant_storage:/qdrant/storage \
  qdrant/qdrant

3. Start Backend

# Backend setup
cd backend
python -m venv .venv

# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt
uvicorn backend.main:app --reload

4. Start Frontend

cd ../frontend
npm install
npm run dev

βš™οΈ Configuration

Create a .env file in the project root:

Name Required Description
OPENAI_API_KEY βœ… OpenAI API key for AI report generation
GEMINI_API_KEY ❌ Optional Gemini fallback API key
DATABASE_URL ❌ PostgreSQL connection string (SQLite fallback supported)
JWT_SECRET_KEY βœ… Secret key for JWT authentication
QDRANT_URL βœ… Qdrant instance or cloud cluster URL
QDRANT_API_KEY ❌ API key for Qdrant Cloud authentication
QDRANT_COLLECTION ❌ Collection name for vector storage
LANGFUSE_PUBLIC_KEY ❌ Langfuse public key
LANGFUSE_SECRET_KEY ❌ Langfuse secret key
LANGFUSE_HOST ❌ Langfuse host URL
R2_ACCOUNT_ID ❌ Cloudflare R2 account ID
R2_ACCESS_KEY_ID ❌ Cloudflare R2 access key
R2_SECRET_ACCESS_KEY ❌ Cloudflare R2 secret access key
R2_BUCKET ❌ Cloudflare R2 bucket name
CORS_ORIGINS ❌ Allowed frontend origins for CORS

Usage

  1. Open the frontend in your browser at http://localhost:8080.
  2. Register or login
  3. Select or create a workspace
  4. Select your preferred report language in the dashboard.
  5. Upload a document (PDF, CSV, DOCX, TXT)
  6. Wait for AI processing (status shown in sidebar).
  7. Click on the document to view the generated report.
  8. Ask questions about uploaded documents in the chat

Tech Stack

Frontend

  • React
  • TypeScript
  • Tailwind CSS
  • Framer Motion

Backend

  • FastAPI
  • Python
  • Pydantic
  • SQLAlchemy

AI & Retrieval

  • OpenAI
  • Gemini (Fallback)
  • Retrieval-Augmented Generation (RAG)
  • Qdrant
  • Langfuse

Infrastructure & Deployment

  • Frontend Hosting: Vercel
  • Backend Deployment: Render
  • Database: Neon PostgreSQL
  • Vector Database: Qdrant Cloud
  • Object Storage: Cloudflare R2
  • Authentication: JWT

Architecture

flowchart TD

A[User Upload] --> B[FastAPI Backend]

B --> C[Document Parsing]

C --> D[Chunking]

D --> E[Embeddings]

E --> F[Qdrant Vector DB]

F --> G[RAG Retrieval]

G --> H[LLM Processing]

H --> I[Structured AI Reports]

H --> J[Workspace AI Chat]
Loading

Roadmap (Planned Features)

  • API-connected data ingestion
  • Advanced analytics & visualizations
  • Semantic search improvements

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m 'Add some feature'
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request

License

This project is licensed under the MIT License.

About

InsightAI: Python-based document processing platform with chunking, LLM-powered reports, and structured data analysis.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors