Upload documents, store them securely in AWS S3, process them asynchronously with a Lambda-compatible worker, and explore extracted text and metadata through a real-time React dashboard.
- Overview
- Architecture
- Features
- Tech Stack
- Project Structure
- Local Setup
- Running Tests
- AWS Deployment
- Environment Variables
- Screenshots
- Resume Bullet Points
- Future Improvements
DocIntel is a production-style, full-stack cloud application that demonstrates end-to-end software engineering across the backend, frontend, cloud infrastructure, and DevOps layers.
Users register, log in, and upload documents (PDF, DOCX, TXT, or images). Files are stored in AWS S3, a processing job is dispatched asynchronously, and the system extracts text, counts words, detects language, and generates a plain-text summary. The React dashboard polls for status in real time and renders the results when ready.
Key engineering decisions:
- The document processor runs as a background thread locally (
USE_LOCAL_WORKER=true) and as an AWS Lambda function in production — the same logic in both places. - LocalStack mocks AWS S3 in the local Docker Compose stack, so no AWS account is needed to develop or test.
- SQLite in-memory is used for the test suite; PostgreSQL is used in development and production via Docker Compose.
- Infrastructure is fully defined in Terraform, covering S3, IAM roles, Lambda, and CloudWatch.
graph TD
Browser["🖥️ Browser\nReact + TypeScript"]
API["⚙️ FastAPI\nREST API + JWT Auth"]
DB[("🗄️ PostgreSQL\nUsers · Documents")]
S3[("☁️ AWS S3\nDocument Storage")]
Lambda["λ Lambda Worker\nAsync Processor"]
Browser -->|"HTTPS / JWT"| API
API -->|"SQLAlchemy ORM"| DB
API -->|"boto3 upload"| S3
API -->|"invoke (prod)\nor thread (dev)"| Lambda
Lambda -->|"boto3 download"| S3
Lambda -->|"PATCH status + text"| API
subgraph Local["Local Dev (Docker Compose)"]
direction TB
DB
LocalStack["🧪 LocalStack\nS3 mock on :4566"]
end
subgraph AWS["AWS Production"]
direction TB
S3
Lambda
CW["📋 CloudWatch Logs"]
Lambda --> CW
end
| Step | What happens |
|---|---|
| 1 | User registers or logs in → receives a JWT |
| 2 | User uploads a file → API validates size/type, streams to S3, creates a DB record (status=pending) |
| 3 | API triggers the processor (background thread in dev, Lambda invoke in production) |
| 4 | Processor downloads the file from S3, extracts text and metadata |
| 5 | DB record updated (status=completed) with word count, page count, language, summary, and extracted text |
| 6 | Frontend polls every 3 s and renders results when processing finishes |
| Feature | Detail |
|---|---|
| JWT Authentication | Register + login with bcrypt-hashed passwords; stateless Bearer tokens |
| Document Upload | Drag-and-drop modal with upload progress bar; 50 MB limit |
| Supported Formats | PDF, DOCX, TXT, PNG, JPG/JPEG |
| Async Processing | Background thread (dev) or AWS Lambda (prod); status polling from the UI |
| Text Extraction | Full extracted text via PyPDF, python-docx, and Pillow |
| Metadata Analysis | Word count, page count, language detection, plain-text summary |
| Real-Time Dashboard | Stats cards, document table, live status badge updates |
| Document Detail | Full extracted text, summary, metadata; presigned S3 download URL |
| Delete Documents | Removes DB record and S3 object atomically |
| AWS Infrastructure | S3 (encrypted, versioned), IAM least-privilege roles, Lambda, CloudWatch |
| Local AWS Mock | LocalStack emulates S3 — no AWS account needed for local dev |
| Seed Data | make seed loads four sample documents (various statuses) |
| CI Pipeline | GitHub Actions: ruff lint, mypy type check, 25 pytest tests, frontend build, Trivy security scan |
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript 5, Vite, Tailwind CSS, Axios, React Router v6 |
| Backend | FastAPI 0.115, Python 3.11/3.12, SQLAlchemy 2.0, Alembic, Pydantic v2 |
| Auth | JWT via python-jose, bcrypt via passlib |
| Database | PostgreSQL 16 (prod/dev) · SQLite in-memory (tests) |
| Cloud Storage | AWS S3 · LocalStack (local dev) |
| Document Processing | PyPDF, python-docx, Pillow |
| Async Worker | AWS Lambda (Python 3.12) · background thread (local mode) |
| Infrastructure | Terraform ≥ 1.6 |
| Containers | Docker, Docker Compose |
| CI/CD | GitHub Actions |
| Testing | pytest 8, pytest-cov, httpx TestClient, SQLite in-memory fixtures |
| Linting / Types | ruff, mypy, ESLint, TypeScript strict mode |
aws-document-intelligence-platform/
│
├── .github/
│ └── workflows/ci.yml # Lint → type check → test → build → security scan
│
├── frontend/ # React + TypeScript + Vite
│ ├── src/
│ │ ├── components/ # Reusable UI (StatusBadge, Navbar)
│ │ ├── pages/ # Auth pages, Dashboard, Document detail
│ │ ├── services/api.ts # Axios client with JWT interceptor
│ │ ├── hooks/useAuth.ts # Auth state and helpers
│ │ ├── types/index.ts # Shared TypeScript interfaces
│ │ └── utils/format.ts # formatBytes, formatDate, status helpers
│ ├── Dockerfile # Multi-stage: dev server + Nginx production image
│ └── package.json
│
├── backend/ # FastAPI application
│ ├── app/
│ │ ├── api/v1/endpoints/ # auth.py · documents.py
│ │ ├── core/ # config.py · database.py · security.py
│ │ ├── models/ # SQLAlchemy ORM: User · Document
│ │ ├── schemas/ # Pydantic request/response models
│ │ └── services/ # s3_service · document_processor · auth_service
│ ├── tests/
│ │ ├── unit/ # Security and processor unit tests
│ │ └── integration/ # Auth and document API tests (25 total)
│ ├── alembic/ # Database migration scripts
│ ├── Dockerfile
│ └── requirements*.txt
│
├── worker/ # AWS Lambda document processor
│ ├── lambda_function.py # Lambda handler (S3 trigger + direct invoke)
│ └── processor.py # Core logic; no FastAPI/SQLAlchemy dependency
│
├── terraform/ # AWS infrastructure as code
│ ├── main.tf # Provider + backend config
│ ├── variables.tf / outputs.tf
│ ├── s3.tf # Bucket: versioning, encryption, lifecycle, S3 trigger
│ ├── iam.tf # Lambda execution role + backend policy (least-privilege)
│ └── lambda.tf # Function, S3 event trigger, CloudWatch log group
│
├── scripts/
│ ├── seed_data.py # Load sample documents into local DB
│ └── localstack-init.sh # Bootstrap S3 bucket inside LocalStack
│
├── docker-compose.yml # PostgreSQL + Backend + Frontend + LocalStack
├── Makefile # Developer commands (make dev, test, lint, seed…)
├── .env.example # All environment variables with descriptions
└── README.md
Install the following:
| Tool | Purpose |
|---|---|
| Docker Desktop | Runs the full local stack |
| Node.js 20+ | Frontend development and builds |
| Python 3.11 or 3.12 | Backend scripts and tests |
| Make | Convenience commands |
| Git | Version control |
git clone https://github.com/Kylefan123/aws-document-intelligence-platform.git
cd aws-document-intelligence-platformcp .env.example .envThe default values in .env.example are enough for local Docker development.
Open Docker Desktop and wait until the engine is running.
make devThis starts the full Docker Compose environment:
| Service | Local URL |
|---|---|
| Frontend | http://localhost:5173 |
| Backend API | http://localhost:8000 |
| API Docs | http://localhost:8000/docs |
| LocalStack | http://localhost:4566 |
| PostgreSQL | localhost:5432 |
Keep this terminal open because it displays the live Docker logs.
Open a second terminal tab from the project root and run:
make migrateYou can either create an account manually in the UI or load sample users and documents.
From the project root:
cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
cd ..
backend/.venv/bin/python scripts/seed_data.pyDemo account:
Email: alice@example.com
Password: password123
Second demo account:
Email: bob@example.com
Password: password123
Frontend:
http://localhost:5173
Backend Swagger docs:
http://localhost:8000/docs
FastAPI automatically generates interactive Swagger documentation at:
http://localhost:8000/docs
The API includes endpoints for:
| Area | Endpoints |
|---|---|
| Auth | Register, login, current user |
| Documents | Upload, list, stats, detail, delete, download URL |
| Health | Health check |
Example API groups:
POST /api/v1/auth/register
POST /api/v1/auth/login
GET /api/v1/auth/me
POST /api/v1/documents/upload
GET /api/v1/documents/
GET /api/v1/documents/stats
GET /api/v1/documents/{document_id}
DELETE /api/v1/documents/{document_id}
GET /api/v1/documents/{document_id}/download-url
GET /health
Backend tests use SQLite in-memory and mocked services, so they do not require a running AWS account.
cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
pytest tests/ -vRun tests with coverage:
pytest tests/ -v --cov=app --cov-report=term-missingBackend:
cd backend
ruff check app/ tests/
mypy app/ --ignore-missing-importsFrontend:
cd frontend
npm install
npm run build
npm run lintStart the full stack:
make devStart services in detached mode:
make upStop services:
make downView running services:
docker compose psView backend logs:
docker compose logs backend --tail=100Clean containers and volumes:
make clean-dockerThis repository includes Terraform definitions for an AWS deployment architecture.
Terraform covers:
- S3 bucket
- Bucket encryption
- Bucket versioning
- IAM roles and policies
- Lambda worker
- CloudWatch logs
- S3-to-Lambda event configuration
Important: this repository is AWS-ready, but it does not claim to be currently deployed to AWS unless a live deployment URL is added.
Do not run Terraform deployment commands unless you are ready to create real AWS resources and review possible costs.
Package the Lambda worker:
make lambda-packagePreview infrastructure:
make terraform-init
make terraform-planApply infrastructure only after reviewing the plan:
make terraform-applyDestroy AWS resources when finished:
make terraform-destroy| Area | Local Development | AWS Deployment Architecture |
|---|---|---|
| Frontend | Vite dev server in Docker | Static hosting or container deployment |
| Backend | FastAPI container | API service deployment |
| Database | PostgreSQL Docker container | Managed PostgreSQL or equivalent |
| Object Storage | LocalStack S3 mock | AWS S3 |
| Worker | Local background processor | AWS Lambda-compatible worker |
| Infrastructure | Docker Compose | Terraform |
| Logs | Docker logs | CloudWatch logs |
The project uses .env.example as a template.
Important variables:
| Variable | Purpose |
|---|---|
APP_ENV |
Development or production environment |
APP_SECRET_KEY |
Application secret key |
DATABASE_URL |
SQLAlchemy database URL |
POSTGRES_USER |
PostgreSQL username |
POSTGRES_PASSWORD |
PostgreSQL password |
POSTGRES_DB |
PostgreSQL database name |
JWT_SECRET_KEY |
Secret used for JWT signing |
AWS_ACCESS_KEY_ID |
AWS access key or LocalStack placeholder |
AWS_SECRET_ACCESS_KEY |
AWS secret key or LocalStack placeholder |
AWS_DEFAULT_REGION |
AWS region |
AWS_S3_BUCKET_NAME |
S3 bucket name |
LAMBDA_FUNCTION_NAME |
Lambda worker name |
USE_LOCAL_WORKER |
Enables local background processing |
VITE_API_BASE_URL |
Frontend API base URL |
For local development, the default .env.example values are enough to run the app with Docker Compose.
Add screenshots to the screenshots/ folder using the filenames below.
For public screenshots, use demo accounts and fake documents only. Do not include real resumes, private emails, school documents, transcripts, or personal information.
- Built a full-stack document intelligence platform using React, TypeScript, FastAPI, PostgreSQL, Docker, and S3-compatible object storage to support document upload, asynchronous processing, metadata extraction, and dashboard visualization.
- Designed a cloud-ready architecture with LocalStack-based S3 emulation, Lambda-compatible document processing, Terraform infrastructure definitions, and GitHub Actions CI/CD.
- Implemented JWT authentication, document status tracking, file validation, extracted text rendering, and automated backend testing across unit and integration suites.
- Amazon Textract — higher-accuracy OCR for scanned PDFs and complex layouts
- Amazon Bedrock / OpenAI — LLM-powered summarisation and document Q&A
- Full-text search — PostgreSQL
tsvectoror pgvector for semantic search - SQS queue — decouple API from Lambda for resilience under load
- Email notifications — SNS or SES alerts when processing completes
- Batch upload — upload and process multiple files in one request
- AWS App Runner / ECS Fargate — managed container deployment for the backend
- Admin panel — user management, system-wide document stats
- Document sharing — generate shareable links with expiry
- Webhook support — push processing results to external systems
MIT © Kyle Theodore — see LICENSE for details.







