Mixture of Softmaxes

Keras implementation of Mixture of Softmaxes. This layer is a type of ensenmble method described in Breaking the Softmax Bottleneck: A High-Rank RNN Language Model.

I have linked below a few blogs that can do this layer more justice than I can.

Experiments

I'm planning on testing this layer with a few different architectures and datasets.

For MNIST I have compared the mixture of softmaxes -- where we combine 3 softmaxes -- and just plain softmax and the improvement is around %1 in accuracy. See MNIST notebook.

The plan is to play around with CIFAR-10 and CIFAR-100 next. I will then move to some actual language models.

Resources

Some useful references:

The official implementation can be found here.
There two interesting blog posts that are quite enlightening. Found here and here.
Mixture of Experts layer for Keras is here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MoS		MoS
.gitignore		.gitignore
LICENSE		LICENSE
MNIST.ipynb		MNIST.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mixture of Softmaxes

Experiments

Resources

About

Uh oh!

Releases

Packages

Languages

License

acyrl/MoS

Folders and files

Latest commit

History

Repository files navigation

Mixture of Softmaxes

Experiments

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages