Official PyTorch implementation and pretrained models for paper: "RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models."
- [2024/01/31] Accepted to ICRA 2024!
- [2024/02/15] Updated fine-tuned checkpoints for the identification and defection detection tasks.
BEiT3-base: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16; #parameters: 222MBEiT3-large: #layer=24; hidden=1024; FFN factor=4x; #head=16; patch=16x16; #parameters: 674M
beit3.spm is the sentencepiece model used for tokenizing texts.
from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer("/your_beit3_model_path/beit3.spm")
alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.8.1-cuda11.1-cudnn8-devel bash
install required packages:
pip install -r requirements.txt
Download this file and put them into the armbench dataset dir. json_files.zip
Additional json files for 3to1 task. ID_json_3t1.zip
python armbench/ID.py --model 'beit3_base_patch16_224' --input_size 224 --task 'armbenchpick1to1' --batch_size 128 \
--layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
--data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05 \
--save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'modelspecifics the name of model we use in this experiments.log_diris the folder dir that stores the ouput log.taskspecifics using armbenchpick1to1 for only use pre-pick images, armbench3t1 for use both pre-pick and post-pick images.data_pathis the folder dir that stores the datasets.finetunespecifics the dir to pretrained weight of BEiT-3 model.
python armbench/defection_id.py --model 'beit3_base_patch16_224' --input_size 224 --task 'defection1by1' --batch_size 128 \
--layer_decay 0.65 --lr 2e-4 --epochs 30 --warmup_epochs 3 --drop_path 0.2 --sentencepiece_model 'beit3.spm' \
--data_path 'path/to/your/dataset' --output_dir 'your_output_path/' --log_dir '/your_log_path/' --weight_decay 0.05 \
--save_ckpt_freq 1 --finetune 'path/to/ckpt/beit3_base_patch16_224.pth'modelspecifics the name of model we use in this experiments.log_diris the folder dir that stores the ouput log.taskspecifics for defect detection.data_pathis the folder dir that stores the datasets.finetunespecifics the dir to pretrained weight of BEiT-3 model.
If you find this repository useful, please consider citing our work:
@misc{long2023robollm,
title={RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models},
author={Zijun Long and George Killick and Richard McCreadie and Gerardo Aragon Camarasa},
year={2023},
eprint={2310.10221},
archivePrefix={arXiv},
primaryClass={cs.RO}
}
This repository is built using the BEiT, the BEiTv2, the BEiTv3, the CLIP, the open_clip, the Oscar, the DeiT, the Dino repository and the timm library.
This project is licensed under the license found in the LICENSE file in the root directory of this source tree.