This repository contains an implementation of Masked Autoencoders (MAE) applied to the CIFAR-10 dataset, instead of the larger datasets used in the original paper, for computational feasibility.
To reproduce the experiments it is possible to:
recunstruct_images.py
This script visualizes the model's reconstruction of masked images from the test set. You can access specific samples by changing the start_index variable.
classify_images.py
This script uses a classifier fine-tuned from the encoder of the pretrained Masked Autoencoder. You can access specific samples by changing the start_index variable.
src/train_reconstruction_mae.py
This script pretrains the full MAE model following the pre-training setting described in the original paper.
src/train_mae_classifier.py
This script fine-tunes a classifier on top of the pretrained encoder (End-to-End fine-tuning, as described in the original paper).
The config.yaml file can be edited to run the training with different configurations.
In the provided scripts:
- The default model used to reconstruct images is the one trained with 75% masking.
- The default model used to classify images is the one obtained fine-tuning the pretrained encoder with 75% masking.
It is possible to change the model path in the scripts (e.g. from mae-75-masking to mae-25-masking) according to the weights released in the src/data/weights folder.
Original Paper: Masked Autoencoders Are Scalable Vision Learners

