RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models

Official PyTorch implementation and pretrained models for paper: "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models."

Updates

[2024/01/31] Accepted to ICRA 2024!
[2024/02/15] Updated fine-tuned checkpoints for the identification and defection detection tasks.

Setup

Download pre-trained Checkpoints

BEiT3-base: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16; #parameters: 222M
BEiT3-large: #layer=24; hidden=1024; FFN factor=4x; #head=16; patch=16x16; #parameters: 674M

Download Text Tokenizer

beit3.spm is the sentencepiece model used for tokenizing texts.

from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer("/your_beit3_model_path/beit3.spm")

Set up the environment

alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel bash

install required packages:

pip install -r requirements.txt

Download our preprossed json files

For the Armbench identification task

Download this file and put them into the armbench dataset dir. json_files.zip

Additional json files for 3to1 task. ID_json_3t1.zip

(Optional) Download our fine-tuned checkpoints

For the Armbench identification task

RoboLLM Base whole gallary

RoboLLM Base within basket

For the Armbench defection detection task

RoboLLM Base

RoboLLM Large

Object Identification

python armbench/ID.py --model 'beit3_base_patch16_224' --input_size 224 --task 'armbenchpick1to1' --batch_size 128 \
 --layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
 --data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05  \
 --save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'

model specifics the name of model we use in this experiments.
log_dir is the folder dir that stores the ouput log.
task specifics using armbenchpick1to1 for only use pre-pick images, armbench3t1 for use both pre-pick and post-pick images.
data_path is the folder dir that stores the datasets.
finetune specifics the dir to pretrained weight of BEiT-3 model.

Defect Detection

python armbench/defection_id.py --model 'beit3_base_patch16_224' --input_size 224 --task 'defection1by1' --batch_size 128 \
 --layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
 --data_path 'path/to/your/dataset'  --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05  \
 --save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'

model specifics the name of model we use in this experiments.
log_dir is the folder dir that stores the ouput log.
task specifics for defect detection.
data_path is the folder dir that stores the datasets.
finetune specifics the dir to pretrained weight of BEiT-3 model.

Citation

If you find this repository useful, please consider citing our work:

@misc{long2023robollm,
      title={RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models}, 
      author={Zijun Long and George Killick and Richard McCreadie and Gerardo Aragon Camarasa},
      year={2023},
      eprint={2310.10221},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Acknowledgement

This repository is built using the BEiT, the BEiTv2, the BEiTv3, the CLIP, the open_clip, the Oscar, the DeiT, the Dino repository and the timm library.

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree.

Microsoft Open Source Code of Conduct

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
armbench		armbench
beit3_scripts		beit3_scripts
beit3_tools		beit3_tools
.gitignore		.gitignore
README.md		README.md
beit3.spm		beit3.spm
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models

Updates

Setup

Download pre-trained Checkpoints

Download Text Tokenizer

Set up the environment

Download our preprossed json files

For the Armbench identification task

(Optional) Download our fine-tuned checkpoints

For the Armbench identification task

For the Armbench defection detection task

Object Identification

Defect Detection

Citation

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models

Updates

Setup

Download pre-trained Checkpoints

Download Text Tokenizer

Set up the environment

Download our preprossed json files

For the Armbench identification task

(Optional) Download our fine-tuned checkpoints

For the Armbench identification task

For the Armbench defection detection task

Object Identification

Defect Detection

Citation

Acknowledgement

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages