DLMA (Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation)

DLMA is a novel approach for aligning large language models through self-rewarding contrastive prompt distillation. This repository contains the source code and instructions for setting up the environment, performing supervised instruct tuning, generating preference data, and training DLMA models.

Installation

To set up the environment, use the following command:

conda env create --file conda-recipe.yaml

Supervised Instruct Tuning

We recommend using the safe-rlhf library for performing supervised fine-tuning (SFT). You can also choose to use other libraries such as LLamaFactory.

Clone the safe-rlhf repository:

git clone git@github.com:PKU-Alignment/safe-rlhf.git
cd safe-rlhf

Train the SFT model:

bash scripts/sft.sh \
--model_name_or_path /root/models/meta-llama/Llama-2-7b-hf \
--output_dir output/llama2-sft

Split and resave the model:

cd ..   
python split_and_resave_model.py \   
--input_model_path safe-rlhf/output/llama2-sft \  
--output_model_path models/sft

Preference Data Generation

Generate preference data using the following scripts:

prompt_type_pos="harmless_positive_prompt"
prompt_type_neg="harmless_negative_prompt"
model_output_path="models/sft"
data_path="PKU-Alignment/PKU-SafeRLHF"
gpu_number=8
batch_size=4

bash scripts/generate_data_scripts.sh \
$prompt_type_pos \
$prompt_type_neg \
$model_output_path \
$data_path \
$gpu_number \
$batch_size

Combine different parts:

python merge_files.py \
--data_name "$data_path" \
--model_name "$model_output_path" \
--batch_size "$batch_size" \
--prompt_type_pos "$prompt_type_pos" \
--prompt_type_neg "$prompt_type_neg" \
--split_number "$gpu_number" \
--output_dir "generated_data/llama2-pku-safety"

Training DLMA Models

Train the DLMA model:

bash scripts/run_dlma.sh \   
-model llama7b \   
-model_path models/sft \   
-datasets \[llama2-pku-safety\] \   
-exp_name dpo_llama2_weight_margin_02_40 \   
-margin_loss_weight 0.2 \   
-min_range -40 \   
-max_range 40

Split and resave the model:

python split_and_resave_model.py \   
--base_model_path safe-rlhf/output/llama2-sft \   
--input_model_path /root/DLMA/.cache/root/dpo_llama2_weight_margin_02_40/LATEST/policy.pt \ 
--output_model_path models/dlma

The trained DLMA model will be available in the models/dlma directory.

License

This project is licensed under the Apache License 2.0.

Citation

@inproceedings{liu-etal-2024-direct,
    title = "Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation",
    author = "Liu, Aiwei  and
      Bai, Haoping  and
      Lu, Zhiyun  and
      Kong, Xiang  and
      Wang, Xiaoming  and
      Shan, Jiulong  and
      Cao, Meng  and
      Wen, Lijie",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.523",
    pages = "9688--9712",
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data_utils		data_utils
scripts		scripts
.gitignore		.gitignore
ACKNOWLEDGEMENTS		ACKNOWLEDGEMENTS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Readme.md		Readme.md
calculate_margin_and_filter.py		calculate_margin_and_filter.py
conda-recipe.yaml		conda-recipe.yaml
efficient_generation.py		efficient_generation.py
merge_files.py		merge_files.py
preference_datasets.py		preference_datasets.py
requirements.txt		requirements.txt
split_and_resave_model.py		split_and_resave_model.py
train.py		train.py
trainers.py		trainers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DLMA (Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation)

Table of Contents

Installation

Supervised Instruct Tuning

Preference Data Generation

Training DLMA Models

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

exlaw/DLMA

Folders and files

Latest commit

History

Repository files navigation

DLMA (Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation)

Table of Contents

Installation

Supervised Instruct Tuning

Preference Data Generation

Training DLMA Models

License

Citation

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages