Official repository for Interactive Person Retrieval via Multi-Turn Multimodal Conversation, accepted by ICML 2026.
Code and dataset are coming soon.
We are currently preparing the camera-ready version and organizing the release of the dataset, training code, evaluation scripts, and model weights.
This project studies multimodal interactive person retrieval, where users refine retrieval results through multi-turn conversations with visual feedback on candidate images. We build MInterPEDES, a multimodal conversational dataset, and propose MNEMO, which encodes each dialogue turn as an atomic multimodal unit and aggregates dialogue memory to capture fine-grained cross-turn dependencies.
- Release the MInterPEDES dataset
- Release training and evaluation code
- Release model weights
We will update this checklist as each component becomes available.
If you find this project useful for your research, please consider citing our paper:
@inproceedings{bai2026interactive,
title={Interactive person retrieval via multi-turn multimodal conversation},
author={Bai, Yang and Wang, Tingfeng and Yang, Bin and Cao, Min and Wang, Jinqiao and Ye, Mang},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}This code is distributed under an MIT LICENSE.