Generation of multilingual text

This is PyTorch implementation of a VAE for multilingual text generation.

A detailed report of our work can be found in the file "rapport_EA_NLP.pdf".

Requirements

Python >= 3.6
PyTorch >= 1.0
pip install editdistance

Data

3 datasets are presented in the folder "datasets".

The "tatoeba_data" dataset can be used to train a VAE using the method described here : https://github.com/bohanli/vae-pretraining-encoder

The other two are processed to be used with our method.

One can use another dataset and preprocess it with the file test.py

Usage

Train a AE first

python text_beta2.py \
    --dataset tatoeba2spm \
    --beta 0 \
    --lr 0.5

Train VAE with our method

ae_exp_dir=exp_tatoeba2spm_beta/tatoeba2spm_lr0.5_beta0.0_drop0.5_
python text_anneal_fb2.py \
    --dataset tatoeba2spm \
    --load_path ${ae_exp_dir}/model.pt \
    --reset_dec \
    --kl_start 0 \
    --warm_up 10 \
    --target_kl 8 \
    --fb 2 \
    --lr 0.5

Create homotopies

vae_exp_dir=exp_tatoeba2spm_load/tatoeba2spm_warm10_kls0.0_fbdim_tr8.0
python homotopie.py \
    --dataset tatoeba2spm \
    --load_path ${vae_exp_dir}/model.pt \
    --fb 2 \
    --lr 0.5

Logs, models and samples would be saved into folder exp.

Acknowledgements

A large portion of this repo is borrowed from https://github.com/bohanli/vae-pretraining-encoder

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
__pycache__		__pycache__
config		config
data		data
datasets		datasets
modules		modules
scripts		scripts
README.md		README.md
Rapport_EA_NLP.pdf		Rapport_EA_NLP.pdf
Tatoeba.en-fr.en.txt		Tatoeba.en-fr.en.txt
Tatoeba.en-fr.fr.txt		Tatoeba.en-fr.fr.txt
Tatoeba.en.txt		Tatoeba.en.txt
Tatoeba.fr.txt		Tatoeba.fr.txt
exp_utils.py		exp_utils.py
homotopie.py		homotopie.py
lm.py		lm.py
prepare_data.py		prepare_data.py
sp_model.model		sp_model.model
sp_model.vocab		sp_model.vocab
test.py		test.py
test_lang.py		test_lang.py
text_anneal_fb.py		text_anneal_fb.py
text_anneal_fb2.py		text_anneal_fb2.py
text_beta.py		text_beta.py
text_beta2.py		text_beta2.py
text_get_mean.py		text_get_mean.py
text_ss_ft.py		text_ss_ft.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generation of multilingual text

Requirements

Data

Usage

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

AdrienBq/MAP511

Folders and files

Latest commit

History

Repository files navigation

Generation of multilingual text

Requirements

Data

Usage

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages