ContextBLIP : Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions
The code of the paper《ContextBLIP : Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions》(ACL 2024)
conda create -n contextblip python=3.9
conda activate contextblip
pip install -r requirements.txt
COCO, VG, IMAGECODE
annotations.zip
bert-base-uncased.zip
model_base.pth
- unzip bert-base-uncased.zip、annotations.zip in ./
- Modify the train_file field in the pretraining configuration file ./configs/pretrain.yaml to the list containing the paths where coco.json and vg.json reside
- The coco and vg images are stored in the./pretrain_data/vl_pair folder, like below:

- run code
unzip bert-base-uncased.zip
unzip annotations.zip
bash run.sh
-
download imagecode dataset
images:image-sets.zip · BennoKrojer/ImageCoDe at main (huggingface.co)
annotations:imagecode/data at main · McGill-NLP/imagecode (github.com)
mkdir data mv image-sets.zip dataset/ mv train_data.json dataset/ mv valid_data.json dataset/ mv test_data_unlabeled.json dataset/ cd dataset unzip image-sets.zip -
Check that the image path is./dataset/image-sets and the marked path is./dataset, as shown in the following figure:
-
run code
nohup python -u finetune.py --finetuned_checkpoint_path {pretrained model path} > finetune.log 2>&1 & #开始训练
python zero-shot_new.py --finetuned_checkpoint_path {pretrained model path}python analysis/analysis_finetune.py --finetuned_checkpoint_path {finetuned model path} #评估finetune模型you need to replace the finetune model path in line 58.
python evaluate_vlm_contextblip.py
you need to replace the API Key in line 58.
python sample.py #sample the subsets, random seed can be changed in the file
# datapath need to be modified in the file
# GPT4 API (You Need GPT4-vision API KEY)
python GPT4v.py
# ContextBLIP
python analysis/gpt4_comparison.py
In the ablation experiment, the image mask rate was adjusted by adjusting the command line parameters
#img_mask_rate
nohup python -u -m torch.distributed.run --nproc_per_node 4 main.py --mask_rate ${img_mask_rate} --output_dir 'output/Pretrain/'$img_mask_rate'' > pretrain.log 2>&1 &