Keras implementation of Mixture of Softmaxes. This layer is a type of ensenmble method described in Breaking the Softmax Bottleneck: A High-Rank RNN Language Model.
I have linked below a few blogs that can do this layer more justice than I can.
I'm planning on testing this layer with a few different architectures and datasets.
For MNIST I have compared the mixture of softmaxes -- where we combine 3 softmaxes -- and just plain softmax and the improvement is around %1 in accuracy. See MNIST notebook.
The plan is to play around with CIFAR-10 and CIFAR-100 next. I will then move to some actual language models.
Some useful references: