This repository is the implementation for CaFlow.
Here are the instructions for obtaining the features and videos for the Rhythmic Gymnastics, Fis-V and LOGO datasets used in our experiments:
For VST features:
- The VST features and label files of Rhythmic Gymnastics and Fis-V datasets can be download from the GDLT repository.
- The VST features and label files of LOng-form GrOup (LOGO) datasets can be downloaded from the UIL-AQA repository.
For Rhythmic Gymnastics videos:
- Download the videos from the ACTION-NET repository.
For Fis-V videos:
- Download the videos from the MS_LSTM repository.
Please use the above public repositories to obtain the features and videos needed to reproduce our results. Let us know if you need any clarification or have trouble accessing the data.
To get started, you will need to first clone this project and then install the required dependencies.
- RTX4080S
- CUDA: 12.4
- Python: 3.8+
- PyTorch: 2.4.1+cu124
Install the required packages:
pip install -r requirements.txtThis will install all the required packages listed in the requirements.txt file.
Using the following command to train the model:
python main.py --video-path ../Datasets/GDLT_data/swintx_avg_fps25_clip32 --train-label-path ../Datasets/GDLT_data/train.txt \
--test-label-path ../Datasets/GDLT_data/test.txt --model-name caflow --action-type Ribbon --lr 1e-2 --epoch 100 \
--n_encoder 1 --n_decoder 2 --n_query 4 --alpha 1 --margin 1 --lr-decay cos --decay-rate 1e-2 --dropout 0.3 --loss_align 1 \
--activate-type 2 --n_head 1 --hidden_dim 256 --flow_hidden_dim 256 --ccr-weight 0.02 --ccr-margin 1.0 --ccr-temperature 1.0 \
--ccr-entropy-weight 0.0 --ccr-cardinality-weight 0.0 --ccr-target-ratio 0.5 --ccr-align 1 --ccr-softplus 1 \
--use-snl 1 --snl-stages 1Using the following command to test the model:
python main.py --video-path ../Datasets/GDLT_data/swintx_avg_fps25_clip32 --train-label-path ../Datasets/GDLT_data/train.txt \
--test-label-path ../Datasets/GDLT_data/test.txt --model-name caflow --action-type Ribbon --lr 1e-2 --epoch 100 \
--n_encoder 1 --n_decoder 2 --n_query 4 --alpha 1 --margin 1 --lr-decay cos --decay-rate 1e-2 --dropout 0.3 --loss_align 1 \
--activate-type 2 --n_head 1 --hidden_dim 256 --flow_hidden_dim 256 --ccr-weight 0.02 --ccr-margin 1.0 --ccr-temperature 1.0 \
--ccr-entropy-weight 0.0 --ccr-cardinality-weight 0.0 --ccr-target-ratio 0.5 --ccr-align 1 --ccr-softplus 1 \
--use-snl 1 --snl-stages 1 --test --ckpt {your model saving path}/best.pklWe provide a detailed example to reproduce our results on the Ball (RG) dataset. The corresponding bash script, train_vst_rg_ball.sh, utilizes a two-stage training approach.