Skip to content

zzq229/vaeac

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Variational Autoencoder with Arbitrary Conditioning

Variational Autoencoder with Arbitrary Conditioning (VAEAC) is a neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features.

For more detail, see the following paper:
Oleg Ivanov, Michael Figurnov, Dmitry Vetrov. Variational Autoencoder with Arbitrary Conditioning, ICLR 2019, link.

This PyTorch code implements the model and reproduces the results from the paper.

Setup

Install prerequisites from requirements.txt. This code was tested on Linux (but it should work on Windows as well), Python 3.6.4 and PyTorch 1.0.

Missing Feature Multiple Imputation

To impute missing features with VAEAC one can use impute.py.

impute.py works with real-valued and categorical features. It takes tab-separated values (tsv) file as an input. NaNs in the input file indicate the missing features.

The output file is also a tsv file, where for each object there is num_imputations copies of it with NaNs replaced with different imputations. These copies with imputations are consecutive in the output file. For example, if num_imputations is 2, then the output file is structured as follows

object1_imputation1
object1_imputation2
object2_imputation1
object2_imputation2
object3_imputation1
...

By default num_imputations is 5.

One-hot max size is the number of different values of a categorical feature. The values are assumed to be integers from 0 to K - 1, where K is one-hot max size. For the real-valued feature one-hot max size is assumed to be 0 or 1.

For example, for a dataset with a binary feature, three real-valued features and a categorical feature with 10 classes the correct --one_hot_max_sizes arguments are 2 1 1 1 10.

Validation ratio is the ratio of objects which will be used for validation and the best model selection.

Commands for scripts

Auto-imputations

python auto_imputation_script.py --field {} (To make the whole field to be missing data, range: bg, 0, 1, ... 17)

Example:

python auto_imputation_script.py --field bg 0 1 2 

The results can be found in imputations_vis/

Auto-evaluate

python auto_evaluate_script.py

Commands for training and evaluating features:

Train

python vis_train.py --input_file data/train_test_split/forModel_new_groundtruth.tsv --epochs 150 --validation_ratio 0.1 --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Prepare data(adding mask to color fileds)

python vis_prepare_data.py --input_name forModel_new --seed 100 --prob 0.5

Impute

python vis_impute.py --input_file data/train_test_split/forModel_new_train.tsv --output_file data/imputations/for_evaluate.tsv --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Evaluate

python vis_evaluate_results.py --groundtruth data/train_test_split/forModel_new_groundtruth.tsv --input_file data/train_test_split/forModel_new_train.tsv --imputed_file data/imputations/for_evaluate.tsv --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Citation

If you find this code useful in your research, please consider citing the paper:

@inproceedings{
    ivanov2018variational,
    title={Variational Autoencoder with Arbitrary Conditioning},
    author={Oleg Ivanov and Michael Figurnov and Dmitry Vetrov},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=SyxtJh0qYm},
}

About

Variational Autoencoder with Arbitrary Conditioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Shell 0.3%