RetinalGPT: Advancing Clinical Ophthalmology Through Instruction-Tuned Vision–Language Intelligence

🔔 News

[2026.04.14] RetinalGPT was submitted to Ophthalmology Science and is currently under review.

👀 Overview

RetinalGPT is a retinal vision-language assistant for clinically grounded ophthalmic image understanding and conversation.

This repository includes:

the released RetinalGPT inference scripts
the retinal instruction/alignment data construction pipeline
dataset-specific retinal description builders
a minimal sample for adapting the pipeline to custom data

📁 Repository Structure

RetinalGPT/
├── Instruction/
│   ├── Desc/                 # Dataset-specific description builders
│   ├── configs/              # Pipeline and batch jobs
│   ├── sample/               # Minimal custom-data example
│   ├── batch_runner.py       # Batch request packaging / unpacking
│   ├── pipeline_runner.py    # Instruction / alignment generation
│   └── convert2json.py       # Output conversion helpers
├── figures/
├── llava/
├── run_retinalGPT.py
├── run_retinalGPT_simple.py
└── README.md

✨ Highlights

Retinal Multimodal Assistant: RetinalGPT supports clinically grounded retinal image understanding and conversation with a vision-language modeling framework.
Instruction-Tuned Ophthalmology Intelligence: The project focuses on instruction-following retinal dialogue for clinical-style reasoning and response generation.
Data Construction Pipeline: The repository includes a structured pipeline for building retinal instruction and alignment data from heterogeneous metadata sources.
Custom-Data Adaptation: A minimal sample is included for extending the pipeline to new retinal datasets.

⚙️ Installation

conda create -n retinalgpt python=3.10 -y
conda activate retinalgpt
pip install --upgrade pip
pip install -r requirements.txt

CUDA is required for the provided inference scripts.

🚀 Quick Start

Single-image inference

python3 run_retinalGPT_simple.py \
  --model-name ASU-GSL/RetinalGPT \
  --image-file /path/to/retinal_image.png \
  --question "Please describe this retinal image in detail."

Batch inference

python3 run_retinalGPT.py \
  --model-name ASU-GSL/RetinalGPT \
  --image-folder /path/to/images \
  --question-file /path/to/questions.jsonl \
  --answers-file /path/to/predictions.jsonl

Example input: examples/inference/questions.json

Instruction/alignment pipeline

cd Instruction
python3 pipeline_runner.py UK_instruction_direct

Batch packaging pipeline

cd Instruction
python3 batch_runner.py APTOS

Custom-data sample

cd Instruction
python3 sample/generate_instruction_conversations.py \
  --metadata-csv sample/metadata_template.csv \
  --image-dir /path/to/your/images \
  --output-jsonl sample/generated_instruction_conversations.jsonl

More details: Instruction/sample/README.md

🧠 Pipeline Overview

The overall workflow is:

Collect retinal images and structured metadata.
Convert metadata into hidden textual descriptions.
Combine descriptions with prompts to generate instruction and alignment data.

🙏 Acknowledgement

We thank the LLaVA and LLaVA-Med projects for their open-source vision-language modeling framework.

📖 Citation

@article{zhu2025retinalgpt,
  title={Retinalgpt: A retinal clinical preference conversational assistant powered by large vision-language models},
  author={Zhu, Wenhui and Li, Xin and Chen, Xiwen and Qiu, Peijie and Vasa, Vamsi Krishna and Dong, Xuanzhao and Chen, Yanxi and Lepore, Natasha and Dumitrascu, Oana and Su, Yi and others},
  journal={arXiv preprint arXiv:2503.03987},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetinalGPT: Advancing Clinical Ophthalmology Through Instruction-Tuned Vision–Language Intelligence

🔔 News

👀 Overview

📁 Repository Structure

✨ Highlights

⚙️ Installation

🚀 Quick Start

Single-image inference

Batch inference

Instruction/alignment pipeline

Batch packaging pipeline

Custom-data sample

🧠 Pipeline Overview

🙏 Acknowledgement

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Instruction		Instruction
examples/inference		examples/inference
figures		figures
llava		llava
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_retinalGPT.py		run_retinalGPT.py
run_retinalGPT_simple.py		run_retinalGPT_simple.py

Folders and files

Latest commit

History

Repository files navigation

RetinalGPT: Advancing Clinical Ophthalmology Through Instruction-Tuned Vision–Language Intelligence

🔔 News

👀 Overview

📁 Repository Structure

✨ Highlights

⚙️ Installation

🚀 Quick Start

Single-image inference

Batch inference

Instruction/alignment pipeline

Batch packaging pipeline

Custom-data sample

🧠 Pipeline Overview

🙏 Acknowledgement

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages