_/_/ _/_/ _/_/_/ _/_/_/
_/ _/ _/ _/ _/ _/ _/ _/
_/ _/ _/_/_/_/ _/ _/ _/_/_/
_/ _/ _/ _/ _/ _/ _/
_/_/ _/ _/ _/_/_/ _/
This repository is the official implementation of "Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection".
Create a conda environment and activate it.
conda create -n oadp python=3.10
conda activate oadpInstall PyTorch following the official documentation.
For example,
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113Install MMDetection following the official instructions.
For example,
pip install openmim
mim install mmcv_full==1.7.0
pip install mmdet==2.25.2Install other dependencies.
pip install todd_ai==0.3.0
pip install git+https://github.com/LutingWang/CLIP.git
pip install git+https://github.com/lvis-dataset/lvis-api.git@lvis_challenge_2021
pip install nni scikit-learn==1.1.3Download the MS-COCO dataset to data/coco.
OADP/data/coco
├── annotations
│ ├── instances_train2017.json
│ └── instances_val2017.json
├── train2017
│ └── ...
└── val2017
└── ...
Download the LVIS v1.0 dataset to data/lvis_v1.
OADP/data/lvis_v1
├── annotations
│ ├── lvis_v1_train.json
│ └── lvis_v1_val.json
├── train2017 -> ../coco/train2017
│ └── ...
└── val2017 -> ../coco/train2017
└── ...
python -m oadp.build_annotationsThe following files will be generated
OADP/data
├── coco
│ └── annotations
│ ├── instances_train2017.48.json
│ ├── instances_train2017.65.json
│ ├── instances_val2017.48.json
│ ├── instances_val2017.65.json
│ └── instances_val2017.65.min.json
└── lvis_v1
└── annotations
├── lvis_v1_train.1203.json
├── lvis_v1_train.866.json
├── lvis_v1_val.1203.json
└── lvis_v1_val.866.json
Download the CLIP model.
python -c "import clip; clip.load_default()"Download the ResNet50 model.
mkdir pretrained
python -c "import torchvision; _ = torchvision.models.ResNet50_Weights.IMAGENET1K_V1.get_state_dict(True)"
ln -s ~/.cache/torch/hub/checkpoints/ pretrained/torchvisionDownload and rename soco_star_mask_rcnn_r50_fpn_400e.pth from Baidu Netdisk or Google Drive.
Download the DetPro prompt from Baidu Netdisk.
Organize the pretrained models as follows
OADP/pretrained
├── clip
│ └── ViT-B-32.pt
├── detpro
│ └── iou_neg5_ens.pth
├── torchvision
│ └── resnet50-0676ba61.pth
└── soco
└── soco_star_mask_rcnn_r50_fpn_400e.pth
Generate the ViLD prompts.
python -m oadp.prompts.vildDownload ml_coco.pth from Baidu Netdisk.
Generate the DetPro prompts.
python -m oadp.prompts.detproOrganize the prompts as follows
OADP/data/prompts
├── vild.pth
└── ml_coco.pth
Download the proposals from Baidu Netdisk.
Organize the proposals as follows
OADP/data
├── coco
│ └── proposals
│ ├── rpn_r101_fpn_coco_train.pkl
│ ├── rpn_r101_fpn_coco_val.pkl
│ ├── oln_r50_fpn_coco_train.pkl
│ └── oln_r50_fpn_coco_val.pkl
└── lvis_v1
└── proposals
├── oln_r50_fpn_lvis_train.pkl
└── oln_r50_fpn_lvis_val.pkl
Most commands listed in this section supports the DRY_RUN mode.
When the DRY_RUN environment variable is set to True, the command that follows will not execute the time-consuming parts.
This functionality is intended for quick integrity check.
Most commands run on both CPU and GPU servers.
For CPU, use the python command.
For GPU, use the torchrun command.
Do not use python on GPU servers, since the command will attempt to initialize distributed training.
For all commands listed in this section, [...] means optional parts and (...|...) means choices.
For example,
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS})is equivalent to the following four possible commands
DRY_RUN=True torchrun --nproc_per_node=${GPUS} # GPU under the DRY_RUN mode
DRY_RUN=True python # CPU under the DRY_RUN mode
torchrun --nproc_per_node=${GPUS} # GPU
python # CPUThe following scripts extract features with CLIP, which can be very time-consuming. Therefore, all the scripts support automatically resuming, by skipping existing feature files. However, the existing feature files are sometimes broken. In such cases, users can set the auto_fix option to inspect the integrity of each feature file.
Extract globals and blocks features, which can be used for both coco and lvis
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.globals oake/globals configs/oake/globals.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.blocks oake/blocks configs/oake/blocks.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]Extract objects features for coco
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_coco.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]Extract objects features for lvis
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.oake.objects oake/objects configs/oake/objects_lvis.py [--override .train.dataloader.dataset.auto_fix:True .val.dataloader.dataset.auto_fix:True]Feature extraction can be very time consuming. Therefore, we provide archives of the extracted features on Baidu Netdisk. The extracted features are archived with the following command
cd data/coco/oake/
tar -zcf globals.tar.gz globals
tar -zcf blocks.tar.gz blocks
tar -zcf objects.tar.gz objects/val2017
cd objects/train2017
ls > objects
split -d -3000 - objects. < objects
for i in objects.[0-9][0-9]; do
zip -q -9 "$i.zip" -@ < "$i"
mv "$i.zip" ../..
done
rm objects*The final directory for OAKE should look like
OADP/data
├── coco
│ └── oake
│ ├── blocks
│ │ ├── train2017
│ │ └── val2017
│ ├── globals
│ │ ├── train2017
│ │ └── val2017
│ └── objects
│ ├── train2017
│ └── val2017
└── lvis_v1
└── oake
├── blocks -> ../coco/oake/blocks
├── globals -> ../coco/oake/globals
└── objects
├── train2017
└── val2017
To conduct training for coco
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py [--override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json]To conduct training for lvis
[DRY_RUN=True] [TRAIN_WITH_VAL_DATASET=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.train oadp_ov_lvis configs/dp/oadp_ov_lvis.pyTo test a specific checkpoint
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pthFor the instance segmentation performance on LVIS, use the metrics argument
[DRY_RUN=True] (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_lvis.py work_dirs/oadp_ov_lvis/epoch_24.pth --metrics bbox segmNNI is supported but unnecessary.
DUMP=work_dirs/dump (python|torchrun --nproc_per_node=${GPUS}) -m oadp.dp.test configs/dp/oadp_ov_coco.py work_dirs/oadp_ov_coco/iter_32000.pth
DUMP=work_dirs/dump python tools/nni_dp_test.pyThe checkpoints for OADP are available on Baidu Netdisk.
| mAPN50 | Config | Checkpoint |
|---|---|---|
| oadp_ov_coco.py | work_dirs/oadp_ov_coco/iter_32000.pth |
| OD APr | IS APr | Config | Checkpoint |
|---|---|---|---|
| oadp_ov_lvis.py | work_dirs/oadp_ov_lvis/epoch_24.pth | ||
| oadp_ov_lvis_lsj.py | Coming soon |