Variational Autoencoder with Arbitrary Conditioning

Variational Autoencoder with Arbitrary Conditioning (VAEAC) is a neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features.

For more detail, see the following paper:
Oleg Ivanov, Michael Figurnov, Dmitry Vetrov. Variational Autoencoder with Arbitrary Conditioning, ICLR 2019, link.

This PyTorch code implements the model and reproduces the results from the paper.

Setup

Install prerequisites from requirements.txt. This code was tested on Linux (but it should work on Windows as well), Python 3.6.4 and PyTorch 1.0.

Missing Feature Multiple Imputation

To impute missing features with VAEAC one can use impute.py.

impute.py works with real-valued and categorical features. It takes tab-separated values (tsv) file as an input. NaNs in the input file indicate the missing features.

The output file is also a tsv file, where for each object there is num_imputations copies of it with NaNs replaced with different imputations. These copies with imputations are consecutive in the output file. For example, if num_imputations is 2, then the output file is structured as follows

object1_imputation1
object1_imputation2
object2_imputation1
object2_imputation2
object3_imputation1
...

By default num_imputations is 5.

One-hot max size is the number of different values of a categorical feature. The values are assumed to be integers from 0 to K - 1, where K is one-hot max size. For the real-valued feature one-hot max size is assumed to be 0 or 1.

For example, for a dataset with a binary feature, three real-valued features and a categorical feature with 10 classes the correct --one_hot_max_sizes arguments are 2 1 1 1 10.

Validation ratio is the ratio of objects which will be used for validation and the best model selection.

Commands for scripts

Auto-imputations

python auto_imputation_script.py --field {} (To make the whole field to be missing data, range: bg, 0, 1, ... 17)

Example:

python auto_imputation_script.py --field bg 0 1 2

The results can be found in imputations_vis/

Auto-evaluate

python auto_evaluate_script.py

Commands for training and evaluating features:

Train

python vis_train.py --input_file data/train_test_split/forModel_new_groundtruth.tsv --epochs 150 --validation_ratio 0.1 --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Prepare data(adding mask to color fileds)

python vis_prepare_data.py --input_name forModel_new --seed 100 --prob 0.5

Impute

python vis_impute.py --input_file data/train_test_split/forModel_new_train.tsv --output_file data/imputations/for_evaluate.tsv --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Evaluate

python vis_evaluate_results.py --groundtruth data/train_test_split/forModel_new_groundtruth.tsv --input_file data/train_test_split/forModel_new_train.tsv --imputed_file data/imputations/for_evaluate.tsv --one_hot_max_sizes 1 1 1 11 1 13 0 96 7 1 1 1 1 1 2 77 7 1 1 1 1 1 4 39 7 1 1 1 1 1 6 25 7 1 1 1 1 1 8 41 7 1 1 1 1 1 10 40 7 1 1 1 1 1 12 33 7 1 1 1 1 1 14 21 7 1 1 1 1 1 16 31 7 1 1 1 1 1 18 29 7 1 1 1 1 1 20 29 7 1 1 1 1 1 22 33 7 1 1 1 1 1 24 43 7 1 1 1 1 1 26 43 7 1 1 1 1 1 28 53 7 1 1 1 1 1 30 37 7 1 1 1 1 1 32 43 7 1 1 1 1 1 34 47 7 1 1 1 1 1

Citation

If you find this code useful in your research, please consider citing the paper:

@inproceedings{
    ivanov2018variational,
    title={Variational Autoencoder with Arbitrary Conditioning},
    author={Oleg Ivanov and Michael Figurnov and Dmitry Vetrov},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=SyxtJh0qYm},
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
celeba_model		celeba_model
data		data
.gitignore		.gitignore
README.md		README.md
VAEAC.py		VAEAC.py
__init__.py		__init__.py
auto_evaluate_script.py		auto_evaluate_script.py
auto_imputation_script.py		auto_imputation_script.py
datasets.py		datasets.py
imputation_networks.py		imputation_networks.py
impute.py		impute.py
inpaint.py		inpaint.py
mask_generators.py		mask_generators.py
new_state_full		new_state_full
nn_utils.py		nn_utils.py
print_colors_to_img.py		print_colors_to_img.py
prob_utils.py		prob_utils.py
requirements.txt		requirements.txt
train.py		train.py
train_utils.py		train_utils.py
vis_evaluate_results.py		vis_evaluate_results.py
vis_impute.py		vis_impute.py
vis_prepare_data.py		vis_prepare_data.py
vis_train.py		vis_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Variational Autoencoder with Arbitrary Conditioning

Setup

Missing Feature Multiple Imputation

Commands for scripts

Auto-imputations

Auto-evaluate

Commands for training and evaluating features:

Train

Prepare data(adding mask to color fileds)

Impute

Evaluate

Citation

About

Uh oh!

Releases

Packages

Languages

zzq229/vaeac

Folders and files

Latest commit

History

Repository files navigation

Variational Autoencoder with Arbitrary Conditioning

Setup

Missing Feature Multiple Imputation

Commands for scripts

Auto-imputations

Auto-evaluate

Commands for training and evaluating features:

Train

Prepare data(adding mask to color fileds)

Impute

Evaluate

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages