Skip to content

rendergraf/AI-Genomics-Lab

Repository files navigation

🧬 AI Genomics Lab

AI-powered bioinformatics research platform for genomic analysis and disease detection.

GitHub Stars GitHub Forks MIT license

Tech Stack

Python FastAPI Next.js React Neo4j PostgreSQL Docker Tailwind CSS

LangGraph DeepSeek MIMO Cytoscape.js IGV.js BWA SAMtools bcftools MinIO

📋 Description

AI Genomics Lab is a local-first platform for clinical genomic analysis that combines bioinformatics pipelines, a biomedical knowledge graph, and AI-assisted interpretation. The system analyzes patient DNA samples, detects genomic variants, links them to known diseases through a graph database, and generates research-grade clinical reports using LLM agents.

Research use only — not for clinical diagnostics.

🧬 AI Genomics Research Platform

Bioinformatics system powered by AI to detect genetic diseases from a patient's DNA using:

  • LLMs
  • Graph Database
  • Deep learning models for sequences
  • Scientific agents
  • Bioinformatics pipelines

🎯 Project Status

Status: Phase 6 COMPLETED — Phase 7 IN PROGRESS

Phase Description Status
Phase 1 Docker Infrastructure
Phase 2 Bioinformatics Pipeline
Phase 3 Data Layer (PostgreSQL + Neo4j + MinIO)
Phase 4 API Layer (50+ endpoints)
Phase 5 AI Integration (LLM + Agents)
Phase 6 Frontend (Next.js + IGV + Cytoscape)
Phase 7 Clinical Validation (GIAB benchmarks) 🔄
Phase 8 Knowledge Ingestion (ClinVar, gnomAD) 📋
Phase 9 Production Hardening 📋
Phase 10 Advanced AI 📋

🚀 Features

  • Secure Authentication: JWT-based authentication with Argon2 password hashing, role-based access control (admin, analyst, researcher, viewer)
  • Genome Indexing Pipeline: Nextflow-based genome indexing pipeline creating .fai, .gzi, and .sti indexes for alignment
  • Reference Genome Management: Web interface to manage genome references with URL download and MinIO synchronization
  • Bioinformatics Pipeline: FASTQ → BAM → VCF with BWA, SAMtools, bcftools, and GATK
  • Knowledge Graph: Neo4j with Gene, Mutation, Disease, Protein, Drug, and Paper nodes
  • LLM Integration: DeepSeek / MIMO API for mutation explanation and report generation
  • AI Agents: Multi-agent system (VariantAgent, GraphAgent, LiteratureAgent, ReportAgent)
  • Modern UI: Next.js dashboard with multiple sections (Alignment, Storage, Analysis, Samples, Reference Genomes)
  • Settings Management: Comprehensive platform configuration with permission-based access control
  • MinIO Storage Integration: Object storage for genome files with sync capabilities between local and cloud storage
  • Genome Sync Service: Automated synchronization of genome files between local storage and MinIO buckets
  • Real-time Job Monitoring: Live streaming of Nextflow pipeline logs with stage tracking and progress updates

🖼️ Screenshots

Dashboard

Dashboard Pipeline

🏗️ Architecture

graph TB
    subgraph Input["📥 Data Input"]
        FASTQ[FASTQ Files]
        FASTA[FASTA Files]
        BAM[BAM Files]
        GENOME_URL[Genome URLs]
    end

    subgraph Indexing["🧬 Genome Indexing Pipeline"]
        NXF[Nextflow Runner]
        DOWNLOAD[Download Genome]
        FAI[Create .fai Index]
        GZI[Create .gzi Index]
        STI[Create .sti Index]
        UPLOAD[Upload to MinIO]
    end

    subgraph Pipeline["🧬 Bioinformatics Pipeline"]
        QC[Quality Control]
        ALIGN[Alignment BWA-MEM]
        SORT[Sorting SAMtools]
        VAR[Variant Calling bcftools]
        VCF[VCF Output]
    end

    subgraph Storage["💾 Storage"]
        MINIO[MinIO Object Store]
        POSTGRES[PostgreSQL]
        NEO4J[Neo4j Graph DB]
    end

    subgraph AI["🤖 AI Layer"]
        VA[Variant Agent]
        GA[Graph Agent]
        LA[Literature Agent]
        RA[Report Agent]
        LLM[LLM OpenRouter]
    end

    subgraph API["⚡ API Layer"]
        FASTAPI[FastAPI Backend]
    end

    subgraph UI["🎨 User Interface"]
        NEXT[Next.js Frontend]
        ALIGN_UI[Align Genome]
        STORAGE_UI[Storage Manager]
        REF_UI[Reference Genomes]
        ANALYSIS_UI[Analysis Dashboard]
        GRAPH[Cytoscape.js Graph]
        BROWSER[IGV Genome Browser]
    end

    GENOME_URL --> NXF
    NXF --> DOWNLOAD
    DOWNLOAD --> FAI
    FAI --> GZI
    GZI --> STI
    STI --> UPLOAD
    UPLOAD --> MINIO

    FASTQ --> QC
    FASTA --> QC
    BAM --> QC
    QC --> ALIGN
    ALIGN --> SORT
    SORT --> VAR
    VAR --> VCF
    
    VCF --> NEO4J
    VCF --> POSTGRES
    FASTQ --> MINIO
    FASTA --> MINIO
    BAM --> MINIO
    MINIO --> ALIGN

    NEO4J --> GA
    GA --> LLM
    VCF --> VA
    VA --> LLM
    LLM --> RA
    
    FASTAPI --> POSTGRES
    FASTAPI --> NEO4J
    FASTAPI --> MINIO
    FASTAPI --> LLM
    FASTAPI --> NXF
    
    NEXT --> FASTAPI
    ALIGN_UI --> FASTAPI
    STORAGE_UI --> MINIO
    REF_UI --> FASTAPI
    ANALYSIS_UI --> FASTAPI
    GRAPH --> NEO4J
Loading

🔄 Data Flow

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Nextflow
    participant MinIO
    participant Pipeline
    participant Neo4j
    participant LLM

    Note over User,LLM: Genome Indexing Workflow
    User->>Frontend: Add Genome Reference
    Frontend->>API: POST /api/settings/genome-references
    API->>Frontend: Reference saved
    
    User->>Frontend: Index Genome
    Frontend->>API: POST /genome/index
    API->>Nextflow: Execute Nextflow pipeline
    Nextflow->>Nextflow: Download genome
    Nextflow->>Nextflow: Create indexes (.fai, .gzi, .sti)
    Nextflow->>MinIO: Upload indexes
    Nextflow->>API: Streaming logs
    API->>Frontend: Real-time updates
    Nextflow->>API: Job completion
    API->>Frontend: Indexing complete
    
    Note over User,LLM: Storage Management
    User->>Frontend: Sync Genomes
    Frontend->>API: POST /storage/sync/genomes
    API->>MinIO: List genomes
    MinIO->>API: Genome list
    API->>Frontend: Sync status
    
    Note over User,LLM: Analysis Pipeline
    User->>Frontend: Upload Genome File
    Frontend->>API: POST /analysis/upload
    API->>MinIO: Store file
    MinIO->>API: File stored
    
    User->>Frontend: Run Analysis
    Frontend->>API: POST /analysis/run
    API->>Pipeline: Execute pipeline
    Pipeline->>API: VCF results
    
    API->>Neo4j: Store variants
    User->>Frontend: View Graph
    Frontend->>API: GET /graph/genes/{gene}
    API->>Neo4j: Query graph
    Neo4j->>Frontend: Graph data
    
    User->>Frontend: AI Analysis
    Frontend->>API: POST /agents/analyze
    API->>Neo4j: Get context
    API->>LLM: Explain mutation
    LLM->>Frontend: Analysis result
    
    User->>Frontend: Generate Report
    Frontend->>API: POST /agents/report
    API->>LLM: Generate report
    LLM->>Frontend: Scientific report
Loading

📁 Project Structure

AI-Genomics-Lab/
├── api/                          # FastAPI backend
│   ├── main.py                   # App entry point + top-level endpoints
│   ├── v1/                       # Versioned API routes
│   │   ├── patients.py           # Patient CRUD
│   │   ├── cases.py              # Clinical case management
│   │   ├── samples.py            # Sample management + FASTQ upload
│   │   ├── variants.py           # Variant querying with filters
│   │   ├── reports.py            # Clinical report CRUD
│   │   ├── pipeline_runs.py      # Pipeline execution management
│   │   ├── hospitals.py          # Hospital registry
│   │   └── clinical_catalogs.py  # Cancer types, stages, histology
│   ├── dependencies.py           # Auth dependencies
│   ├── requirements.txt          # Python dependencies
│   └── Dockerfile                # API container
├── services/                     # Core business logic
│   ├── auth_service.py           # JWT + Argon2 + RBAC + audit
│   ├── database_service.py       # PostgreSQL schema + CRUD (28 tables)
│   ├── minio_service.py          # MinIO object storage client
│   ├── neo4j_service.py          # Neo4j graph operations
│   ├── llm_client.py             # LLM client (DeepSeek/MIMO/OpenRouter) + cache
│   ├── bio_pipeline_client.py    # VCF parsing + pipeline integration
│   ├── nextflow_runner.py        # Nextflow execution via Docker + SSE
│   ├── cache_service.py          # LLM response caching
│   └── genome_sync_service.py    # Genome file synchronization
├── agents/                       # AI agent system
│   └── __init__.py               # VariantAgent, GraphAgent, LiteratureAgent, ReportAgent
├── core/                         # Shared FastAPI dependencies (RBAC)
│   └── deps.py
├── bio-pipeline/                 # Bioinformatics pipeline
│   ├── Dockerfile                # Ubuntu 22.04 + strobealign + SAMtools + bcftools + GATK + Nextflow
│   ├── genome_index_correct.nf   # Nextflow: genome download + index creation
│   ├── pipeline_1_genome_prep.nf # Nextflow: genome preparation
│   ├── scripts/
│   │   ├── pipeline.sh           # Main analysis: FASTQ → CRAM → VCF
│   │   ├── index_genome.sh       # Reference genome indexing
│   │   └── vcf_dashboard.sh      # VCF statistics generator
│   └── archive/                  # Historical pipeline variants
├── graph/                        # Neo4j knowledge graph
│   └── schema.cypher             # Schema + seed data (genes, mutations, diseases, drugs, pathways)
├── frontend/                     # Next.js 16 frontend
│   ├── src/
│   │   ├── app/                  # Pages (dashboard, login, settings, cases)
│   │   ├── components/           # React components
│   │   │   ├── sections/         # Page sections (AlignGenome, Storage, Analysis, etc.)
│   │   │   ├── cases/            # Clinical case wizard (10 steps)
│   │   │   ├── ui/               # Design system (Button, Card, Input, Select, etc.)
│   │   │   ├── GraphView.tsx     # Cytoscape.js knowledge graph
│   │   │   ├── VariantTable.tsx  # Variant table with filters
│   │   │   └── GenomeBrowser.tsx # IGV.js genome browser
│   │   ├── stores/               # Zustand state management
│   │   ├── hooks/                # Custom React hooks
│   │   ├── types/                # TypeScript type definitions
│   │   └── design-system/        # Tokens, variants, utilities
│   ├── package.json
│   └── Dockerfile
├── docker/                       # Infrastructure
│   ├── docker-compose.yml        # Multi-service orchestration (7 services)
│   └── dev.sh                    # Development helper script
├── scripts/                      # Utility scripts
│   └── init_database.py          # Database initialization
├── datasets/                     # Data directory (gitignored)
│   ├── fastq/                    # Patient FASTQ files
│   ├── bam/                      # Aligned BAM/CRAM files
│   ├── vcf/                      # Variant call files
│   ├── reference_genome/         # Reference genome (hg38)
│   ├── annotations/              # ClinVar annotation files
│   └── logs/                     # Pipeline execution logs
├── PLAN.md                       # Master project plan
├── ARCHITECTURE.md               # Technical architecture details
├── CONTRIBUTING.md               # Contribution guidelines
├── README.md                     # This file
└── LICENSE                       # MIT License

🛠️ Tech Stack

Category Technology
Backend FastAPI (Python 3.11+)
Database PostgreSQL 15, Neo4j 5.14
Storage MinIO (S3-compatible)
AI/LLM DeepSeek, MIMO (multi-provider)
Frontend Next.js 16, React 18, Tailwind CSS
Visualization Cytoscape.js, IGV.js, Recharts
Pipeline Orchestration Nextflow DSL2
Bioinformatics strobealign, SAMtools, bcftools, GATK
Authentication JWT + Argon2 + RBAC
State Management Zustand

🌐 Services and Ports

Service Port Description
Frontend 3000 Next.js UI
API 8000 FastAPI backend
Neo4j Browser 7474 Graph database UI
Neo4j Bolt 7687 Graph database protocol
PostgreSQL 5432 Relational database
MinIO API 9000 Object storage
MinIO Console 9001 Storage management UI
pgAdmin 5050 Database admin UI

📊 Data in Neo4j

Loaded Nodes

Type Count Examples
Genes 6 BRCA1, BRCA2, TP53, EGFR, KRAS, PIK3CA
Mutations 6 c.68_69delAG, c.5266dupC, R273H, L858R, G12D, E545K
Diseases 5 Hereditary Breast Cancer, Ovarian Cancer, Li-Fraumeni, NSCLC, Colorectal
Drugs 3 Olaparib, Osimertinib, Sotorasib
Pathways 4 Cell Cycle, Apoptosis, RAS-MAPK, PI3K-AKT

Relationships

(Gene)-[:HAS_MUTATION]->(Mutation)
(Mutation)-[:CAUSES]->(Disease)
(Gene)-[:INTERACTS_WITH]->(Gene)
(Drug)-[:TARGETS]->(Gene)
(Gene)-[:PARTICIPATES_IN]->(Pathway)

🎨 Frontend Components

Dashboard Sections

AlignGenomeSection

Genome alignment and indexing interface:

  • Reference genome selection with indexed status badges
  • Read length configuration for alignment
  • Real-time Nextflow pipeline execution with live log streaming
  • Stage tracking (downloading, indexing, uploading)
  • Cancel indexing and delete index functionality

StorageSection

MinIO storage management:

  • List genomes available in MinIO buckets
  • Sync genomes between local storage and MinIO
  • Download genomes from MinIO to local storage
  • Visual status indicators for sync progress

AnalysisSection

Analysis dashboard with pipeline controls:

  • File upload for FASTQ/BAM/VCF files
  • Pipeline execution controls
  • Analysis status monitoring

ReferenceGenomesSection

Genome reference management:

  • Add/edit/delete genome references with URL, species, build
  • Test genome URL connectivity
  • Manage active/inactive references

SamplesSection

Sample management interface:

  • List and manage genomic samples
  • Sample metadata editing

Visualization Components

GraphView

Interactive knowledge graph visualization using Cytoscape.js:

  • Nodes: Genes (blue), Mutations (red), Diseases (green)
  • Relationships: HAS_MUTATION, CAUSES, INTERACTS_WITH
  • Interactive: click to select, zoom, pan

VariantTable

Variant table with:

  • Search by gene or position
  • Filters by type (SNP, Indel, Structural)
  • Pathogenicity classification (pathogenic, likely_pathogenic, uncertain, likely_benign, benign)
  • Data export

GenomeBrowser

IGV.js integration:

  • Chromosomal locus navigation
  • Quick navigation: BRCA1, TP53, EGFR, KRAS
  • hg38 support

📡 API Endpoints

Health

  • GET / - API information
  • GET /health - Health status

Authentication

  • POST /api/auth/login - User login with JWT token generation
  • POST /api/auth/logout - User logout and session cleanup
  • GET /api/auth/me - Get current user information
  • POST /api/auth/refresh - Refresh access token

Settings (Authenticated)

  • GET /api/settings/genome-references - Get genome references (admin only)
  • POST /api/settings/genome-references - Create genome reference (admin only)
  • PUT /api/settings/genome-references/{ref_id} - Update genome reference (admin only)
  • DELETE /api/settings/genome-references/{ref_id} - Delete genome reference (admin only)
  • POST /api/settings/genome-references/{ref_id}/test - Test genome reference URL (admin only)
  • GET /api/settings/pipeline - Get pipeline settings (admin only)
  • PUT /api/settings/pipeline/{key} - Update pipeline setting (admin only)
  • GET /api/settings/ai-providers - Get AI provider configurations
  • POST /api/settings/ai-providers/test - Test AI provider connection (admin only)
  • GET /api/settings/ui-preferences - Get user UI preferences
  • PUT /api/settings/ui-preferences - Update user UI preferences
  • GET /api/settings/audit-logs - View audit logs (admin only)
  • GET /api/settings/system-health - Get system health status

Genome (Authenticated)

  • GET /genome/indexed - Get indexing status for all genomes
  • GET /genome/status/{genome_id} - Get indexing status for a specific genome
  • POST /genome/index - Start genome indexing using Nextflow pipeline
  • DELETE /genome/index/{genome_id} - Delete genome index files
  • GET /genome/jobs - Get all genome indexing jobs
  • GET /genome/job/{job_id} - Get genome indexing job status

Storage (Authenticated)

  • GET /storage/genomes - List genomes from MinIO storage
  • POST /storage/sync/genomes - Sync local genomes to MinIO
  • GET /storage/genomes/{genome_name}/status - Get sync status for a genome
  • POST /storage/genomes/{genome_name}/download - Download genome from MinIO to local storage
  • GET /storage/test - Test storage connectivity

Analysis

  • POST /analysis/upload - Upload genome file
  • POST /analysis/run - Run pipeline
  • GET /analysis/status - Pipeline status

Graph

  • GET /graph/genes/{gene} - Gene information
  • GET /graph/mutations/{mutation} - Mutation information
  • GET /graph/diseases/{disease} - Disease information
  • GET /graph/search - Search graph
  • GET /graph/statistics - Graph statistics

Agents

  • POST /agents/analyze - Analyze variant
  • POST /agents/report - Generate report
  • POST /agents/complete-analysis - Complete analysis

LLM

  • POST /llm/explain - Explain mutation
  • POST /llm/generate - Text generation

🚦 Getting Started

Prerequisites

  • Docker & Docker Compose
  • Python 3.11+
  • Node.js 20+

Installation

  1. Clone the repository:
git clone https://github.com/rendergraf/AI-Genomics-Lab.git
cd AI-Genomics-Lab
  1. Configure environment:
cp .env.example .env
# Edit .env with your API keys
  1. Start services:
cd docker
docker-compose up -d
  1. Wait for services to be ready (about 30 seconds)

  2. Verify services are running:

docker ps
curl http://localhost:8000/health

Accessing Services

Service URL Credentials
Frontend http://localhost:3000 admin@company.com / admin123
API Docs (Swagger) http://localhost:8000/docs (Authentication required for protected endpoints)
Neo4j Browser http://localhost:7474 neo4j / genomics
MinIO Console http://localhost:9001 genomics / genomics
PostgreSQL localhost:5432 genomics / genomics / genomics

Quick Test

# Check API health
curl http://localhost:8000/health
# Response: {"status":"healthy","api":"ok","database":"ok","graph":"ok","storage":"ok"}

# Test authentication
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@company.com","password":"admin123","remember_me":false}'
# Response: {"access_token":"eyJhbGciOiJ...","refresh_token":"...","token_type":"bearer","expires_in":1800}

# Test authenticated endpoint (using the token from above)
TOKEN="your_access_token_here"
curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/api/auth/me
# Response: {"id":1,"email":"admin@company.com","name":"Administrator","is_active":true,"roles":["admin"]}

# Check available samples
curl http://localhost:8000/analysis/status

Stopping Services

cd docker
docker-compose down

Pipeline Data

Place your genome files in the appropriate directories:

# FASTQ files (input)
mkdir -p datasets/fastq
# Place .fastq or .fastq.gz files here

# Reference genome (supports .fa or .fa.gz)
mkdir -p datasets/reference_genome
# Place Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz here

# Output directories (auto-created)
datasets/bam      # Aligned BAM files
datasets/vcf      # Variant call files
datasets/logs    # Pipeline logs
datasets/annotations  # Annotation files (e.g., clinvar.vcf)

Development

API

cd api
pip install -r requirements.txt
uvicorn main:app --reload

Frontend

cd frontend
npm install
npm run dev

🧬 Bioinformatics Pipeline

Pipeline Overview

The bioinformatics pipeline processes FASTQ files through the following steps:

FASTQ → strobealign → SAM (streamed) → samtools sort → BAM → CRAM → bcftools mpileup → BCF → VCF → filtered VCF → annotated VCF

Tools Used

Tool Purpose
strobealign Sequence alignment (5-8x faster than BWA-MEM)
SAMtools SAM/BAM/CRAM processing and indexing
bcftools Variant calling (mpileup) and filtering
GATK Genome Analysis Toolkit (optional advanced callers)
htslib Low-level HTS file I/O
vmtouch RAM page cache preloading for reference genomes

Pipeline Features

  • Streaming: Uses pipes to avoid writing intermediate SAM files (saves disk space)
  • Parallel processing: Uses 4 threads for BWA and samtools
  • Compressed reference: Supports .fa.gz - automatically decompresses on first run
  • Smart indexing: Only reindexes if indices don't exist
  • Detailed logging: Each step logs to /datasets/logs/{sample}_{tool}.log

Reference Genome

The pipeline supports both compressed and uncompressed reference genomes:

# Place in datasets/reference_genome/
Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz  # Recommended (3GB vs 60GB)
# or
Homo_sapiens.GRCh38.dna_sm.toplevel.fa

On first run, the compressed file will be decompressed automatically.

Running Pipeline

# Via API
curl -X POST http://localhost:8000/analysis/run -H "Content-Type: application/json" \
  -d '{"sample_id": "sample_001"}'

# Check status
curl http://localhost:8000/analysis/status

Pipeline Environment Variables

REFERENCE_GENOME_GZ=/datasets/reference_genome/Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz
REFERENCE_GENOME=/datasets/reference_genome/Homo_sapiens.GRCh38.dna_sm.toplevel.fa
INPUT_DIR=/datasets/fastq
OUTPUT_DIR=/datasets/bam
VCF_OUTPUT_DIR=/datasets/vcf
LOGS_DIR=/datasets/logs
ANNOTATION_DIR=/datasets/annotations

🧬 Genome Indexing Pipeline

Overview

The genome indexing pipeline uses Nextflow to download reference genomes and create necessary indexes for alignment (.fai, .gzi, .sti). The pipeline runs in a Docker container and uploads results to MinIO for persistent storage.

Nextflow Pipeline

  • Input: Genome ID and optional URL
  • Processes: Download, FASTA index (.fai), BGZIP index (.gzi), Strobealign index (.sti)
  • Output: Index files uploaded to MinIO bucket
  • Real-time monitoring: Live log streaming via Server-Sent Events (SSE)

Index Types

  • .fai: FASTA index for random access to sequences
  • .gzi: BGZIP index for compressed FASTA files
  • .sti: Strobealign index for fast read alignment

Usage via API

# Start genome indexing
curl -X POST http://localhost:8000/genome/index \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "genome_id=hg38&read_length=150"

# Check indexing status
curl http://localhost:8000/genome/indexed

# Stream logs for a job
# (Implemented via SSE in frontend)

VariantAgent

Analyzes specific variants by querying the knowledge graph and generating clinical interpretations.

GraphAgent

Performs queries to Neo4j to retrieve information about genes, mutations, and diseases.

LiteratureAgent

Retrieves and analyzes relevant scientific literature for detected variants.

ReportAgent

Generates complete scientific reports including executive summary, methodology, variant analysis, and clinical interpretation.

AnalysisOrchestrator

Orchestrator that coordinates all agents for complete analysis.

📈 API Usage

Example: Variant Analysis

import requests

# Analyze variant
response = requests.post(
    "http://localhost:8000/agents/analyze",
    json={"variant_id": "R273H"}
)
print(response.json())

# Generate report
response = requests.post(
    "http://localhost:8000/agents/report",
    json={
        "sample_id": "sample_001",
        "variants": ["BRCA1:c.68_69delAG", "TP53:R273H"]
    }
)
print(response.json())

🤖 Agent System

VariantAgent

Analyzes specific variants by querying the knowledge graph and generating clinical interpretations.

GraphAgent

Performs queries to Neo4j to retrieve information about genes, mutations, and diseases.

LiteratureAgent

Retrieves and analyzes relevant scientific literature for detected variants.

ReportAgent

Generates complete scientific reports including executive summary, methodology, variant analysis, and clinical interpretation.

AnalysisOrchestrator

Orchestrator that coordinates all agents for complete analysis.

📝 Environment Variables Configuration

# Database
DATABASE_URL=postgresql://genomics:genomics@postgres:5432/genomics

# Neo4j
NEO4J_URI=bolt://neo4j:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=genomics

# MinIO
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=genomics
MINIO_SECRET_KEY=genomics

# LLM (DeepSeek / MIMO / OpenRouter)
LLM_PROVIDER=deepseek
DEEPSEEK_API_KEY=your_deepseek_api_key_here
MIMO_API_KEY=your_mimo_api_key_here

# Pipeline (optional)
REFERENCE_GENOME=/datasets/reference_genome/Homo_sapiens.GRCh38.dna_sm.toplevel.fa
REFERENCE_GENOME_GZ=/datasets/reference_genome/Homo_sapiens.GRCh38.dna_sm.toplevel.fa.gz

🔒 Security

  • Authentication: JWT with Argon2 password hashing (GPU/ASIC-resistant)
  • Authorization: Role-based access control (admin, analyst, researcher, viewer)
  • Audit Logging: All authentication events and configuration changes logged
  • Patient Data Protection: Pseudonymized external IDs, no PII in LLM calls
  • API Security: CORS configuration, SQL injection prevention via parameterized queries
  • Configuration: Environment variables for secrets, no hardcoded credentials in code

Note: This is a research platform. It is NOT certified for clinical diagnostics (FDA/CE-IVD). All AI-generated reports must be reviewed by qualified personnel before any clinical decision.

🧪 Testing

Critical modules include tests:

  • Bioinformatics pipeline
  • Variant parser
  • Graph ingestion

🤝 Contributions

Contributions are welcome!

📄 License

MIT License - See LICENSE for details.


Author: Xavier Araque
Email: xavieraraque@gmail.com
GitHub: https://github.com/rendergraf/AI-Genomics-Lab
Version: 0.2
Location: Spain
Date: June 2026


Generated by AI Genomics Lab

About

🧬 AI Genomics Research Platform Bioinformatics system powered by AI to detect genetic diseases from a patient’s DNA using: LLMs, Graph Database, Scientific agents, Bioinformatics pipelines

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors