CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
This is an official release of the paper CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection.
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection,
Hojun Choi, Youngsun Lim, Jaeyo Shin, Hyunjung Shim
⛽⛽⛽ Contact: eric970412@gmail.com
- [✅] [2024.12.31] 👨💻 The official codes have been released!
- [✅] [2024.10.16] 📄 Our paper is now available! You can find the paper here.
This project is based on MMDetection 3.x
It requires the following OpenMMLab packages:
- MMEngine >= 0.6.0
- MMCV-full >= v2.0.0rc4
- MMDetection >= v3.0.0rc6
- lvisapi
pip install openmim mmengine
mim install "mmcv>=2.0.0rc4"
pip install git+https://github.com/lvis-dataset/lvis-api.git
mim install "mmdet>=3.0.0rc6"
pip install ftfy regexThis project is released under the NTU S-Lab License 1.0.
We use CLIP's ViT-B-16 model for the implementation of our method.
pip install git+https://github.com/openai/CLIP.git and run
import clip
import torch
model, _ = clip.load("ViT-B/16")
torch.save(model.state_dict(), 'checkpoints/clip_vitb16.pth')The pseudo-label generation is on pseudo-label or download instances_train2017_pseudo_v0_new.json from huggingface.
The training and testing on OV-COCO are supported now.
@misc{choi2025cotplvisualchainofthoughtreasoning,
title={CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection},
author={Hojun Choi and Youngsun Lim and Jaeyo Shin and Hyunjung Shim},
year={2025},
eprint={2510.14792},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.14792},
}