- [2026.04.14] RetinalGPT was submitted to Ophthalmology Science and is currently under review.
RetinalGPT is a retinal vision-language assistant for clinically grounded ophthalmic image understanding and conversation.
This repository includes:
- the released RetinalGPT inference scripts
- the retinal instruction/alignment data construction pipeline
- dataset-specific retinal description builders
- a minimal sample for adapting the pipeline to custom data
RetinalGPT/
├── Instruction/
│ ├── Desc/ # Dataset-specific description builders
│ ├── configs/ # Pipeline and batch jobs
│ ├── sample/ # Minimal custom-data example
│ ├── batch_runner.py # Batch request packaging / unpacking
│ ├── pipeline_runner.py # Instruction / alignment generation
│ └── convert2json.py # Output conversion helpers
├── figures/
├── llava/
├── run_retinalGPT.py
├── run_retinalGPT_simple.py
└── README.md
- Retinal Multimodal Assistant: RetinalGPT supports clinically grounded retinal image understanding and conversation with a vision-language modeling framework.
- Instruction-Tuned Ophthalmology Intelligence: The project focuses on instruction-following retinal dialogue for clinical-style reasoning and response generation.
- Data Construction Pipeline: The repository includes a structured pipeline for building retinal instruction and alignment data from heterogeneous metadata sources.
- Custom-Data Adaptation: A minimal sample is included for extending the pipeline to new retinal datasets.
conda create -n retinalgpt python=3.10 -y
conda activate retinalgpt
pip install --upgrade pip
pip install -r requirements.txtCUDA is required for the provided inference scripts.
python3 run_retinalGPT_simple.py \
--model-name ASU-GSL/RetinalGPT \
--image-file /path/to/retinal_image.png \
--question "Please describe this retinal image in detail."python3 run_retinalGPT.py \
--model-name ASU-GSL/RetinalGPT \
--image-folder /path/to/images \
--question-file /path/to/questions.jsonl \
--answers-file /path/to/predictions.jsonlExample input: examples/inference/questions.json
cd Instruction
python3 pipeline_runner.py UK_instruction_directcd Instruction
python3 batch_runner.py APTOScd Instruction
python3 sample/generate_instruction_conversations.py \
--metadata-csv sample/metadata_template.csv \
--image-dir /path/to/your/images \
--output-jsonl sample/generated_instruction_conversations.jsonlMore details: Instruction/sample/README.md
The overall workflow is:
- Collect retinal images and structured metadata.
- Convert metadata into hidden textual descriptions.
- Combine descriptions with prompts to generate instruction and alignment data.
We thank the LLaVA and LLaVA-Med projects for their open-source vision-language modeling framework.
@article{zhu2025retinalgpt,
title={Retinalgpt: A retinal clinical preference conversational assistant powered by large vision-language models},
author={Zhu, Wenhui and Li, Xin and Chen, Xiwen and Qiu, Peijie and Vasa, Vamsi Krishna and Dong, Xuanzhao and Chen, Yanxi and Lepore, Natasha and Dumitrascu, Oana and Su, Yi and others},
journal={arXiv preprint arXiv:2503.03987},
year={2025}
}