Step by step implementation of Transformers architecture, primarily by focusing on the paper Attention is All You Need by Vaswani et al, 2017.
Paper : https://arxiv.org/abs/1706.03762
Related Papers :
- LayerNorm : https://arxiv.org/abs/1607.06450
- ResNet(Concept of Residual Connection) : https://arxiv.org/abs/1512.03385