An Agentic RAG system that transforms raw documents into a living, conversational knowledge engine — built for scale, isolation, and intelligence.
Cortexa is a production-grade AI knowledge infrastructure system built around Agentic Retrieval-Augmented Generation (RAG).
It is not a "chat with PDF" wrapper. It is designed as a full AI architecture that ingests documents, builds structured knowledge, and serves intelligent answers — isolated per user, per chat, per session.
The goal is to evolve from a RAG engine into a full Agentic AI platform capable of multi-step reasoning, tool execution, and autonomous document workflows.
"AI should not just answer questions — it should understand your data."
Cortexa is built on four non-negotiable principles:
| Principle | Description |
|---|---|
| 🔐 Isolation-first | Every entity is strictly scoped: user → chat → file → vector chunks. No leakage across boundaries. |
| ⚙️ Async-first | All heavy AI workloads run in background jobs. The API never blocks on intelligence. |
| 🧠 AI modularity | The RAG pipeline is fully decoupled via gRPC. The AI Brain is a standalone service. |
| 🚀 Agent-ready | Architecture is designed from day one to support tool execution and agentic reasoning. |
Client
│
NestJS API
│
┌────────────────┼──────────────────┐
│ │ │
Auth Files Chat
│ │
PostgreSQL BullMQ Queue
│
File Worker
│
gRPC
│
Python Brain
│
┌──────────────────────────────────────────┐
│ Extract → Chunk → Embed → Retrieve → Rerank │
└─────────────────────┬────────────────────┘
│
Qdrant DB
│
LLM
The NestJS API handles all client-facing logic and orchestrates background jobs via BullMQ. Heavy AI work — chunking, embedding, retrieval, and reranking — lives entirely inside the Python Brain service, accessed through a clean gRPC interface.
- NestJS + TypeScript
- PostgreSQL + TypeORM — users, sessions, file records
- Redis + BullMQ — async job queue for AI processing
- JWT — access (15m) + refresh (7d) tokens via HTTPOnly cookies
- gRPC — microservice bridge to AI Brain
- Python 3 — gRPC server
- Qdrant — vector database for semantic search
- Gemini — LLM layer
- Sentence-aware chunking — context-preserving document splitting
- Embedding models — dense vector generation
src/
└── modules/
├── auth/ → JWT authentication + session management
├── users/ → user lifecycle
├── chat/ → conversation isolation + history
├── files/ → upload, deduplication, lifecycle tracking
├── session/ → refresh token sessions
├── queue/ → BullMQ job configuration
└── ai-service/ → gRPC bridge to Python Brain
- JWT access + refresh token flow
- HTTPOnly cookie security
- Multi-device session tracking
- File upload API
- SHA256 deduplication engine — no duplicate processing
- Async processing via BullMQ workers
- File lifecycle states:
uploaded → processing → processed → failed
- Background ingestion worker
- gRPC communication layer between NestJS and Python Brain
- Full separation of AI logic from business logic
- Document chunking pipeline design
- Qdrant vector database integration
- Multi-user and multi-chat vector isolation model
- gRPC Brain ingestion service — completing the full ingestion flow
- Sentence-aware chunking — context-preserving splits
- Reranking system — improving retrieval precision
- Chat memory optimization — compressing long conversation context
- Streaming responses — real-time answer delivery
These are the next intelligence layers on the roadmap:
- Hybrid search — BM25 + vector fusion for better recall
- Context compression — smart pruning of retrieved chunks
- Long-term memory — persistent user/chat knowledge
- Conversation summarization — memory-efficient history
- Agent execution layer — tool usage + multi-step reasoning
Phase 1 → RAG Engine (in progress)
Phase 2 → Intelligence Layer (hybrid search, memory, reranking)
Phase 3 → Agent System (tools, reasoning, workflows)
Phase 4 → AI OS (full knowledge operating system)
Every piece of knowledge is strictly scoped to its owner:
user_id → chat_id → file_id → vector chunks
There is no cross-user leakage and no cross-chat contamination. Each knowledge space is fully isolated at the vector level.
Cortexa is an open, evolving project. Contributions are welcome across:
- AI / RAG — chunking strategies, retrieval tuning, reranking, hybrid search
- Backend — NestJS modules, queue optimizations, gRPC services
- Python Brain — embedding pipeline, memory systems, agent tooling
- Documentation — architecture diagrams, guides, API docs
If you're interested in contributing, open an issue to discuss what you'd like to work on. The project is still being actively built, so coordination matters.
Full setup guide coming soon. The system requires NestJS, Python 3, PostgreSQL, Redis, and Qdrant running locally or via Docker.
# Clone the repo
git clone https://github.com/your-username/cortexa.git
cd cortexa
# Set up environment variables
cp .env.example .env
# Run Make and hes going to run all the projects with dependancies
Make
# Start services (Docker Compose coming soon)MIT License — © 2026 Cortexa