cpp-transformer

A C++ implementation of Transformer without special library dependencies, including training and inference.

This project replicates the content of Chapter 11 on Transformers in Dive into Deep Learning. It builds an English-French machine translation model using C++. The project develops its own automatic differentiation framework and only depends on the C++ standard library, aiming to help users understand the underlying principles of Transformers.

Project Highlights

Principle - Oriented

We construct the model starting from fundamental operations without relying on deep learning frameworks. This approach clearly demonstrates the operational mechanism of Transformers.

Automatic Differentiation

Our self - developed automatic differentiation framework simplifies the gradient calculation process, facilitating a better understanding of the backpropagation algorithm.

Low Dependencies

The project only depends on the C++ standard library. While its performance may not be as high - end as those with advanced libraries, it clearly showcases every computational detail. This characteristic allows users to gain a profound understanding of the backpropagation algorithm and the underlying principles of the Transformer architecture.

Update Log

V2 - [2025-05-29]

Redesigned Tensor Class
Redesigned Backend Ops Interface
Redesigned Computation Flow
- Pre - computed Tensor Dependency Logic and Batch Memory Allocation
- Compact Memory Layout
- Efficient zero_grad Implementation
Closer Implementation to Tensor Semantics in DL2 Chapter 11
Enhanced Test Cases

V2.01 - [2025-06-08]

Supported a simple language model.

V2.02 - [2025-06-14]

Supported Metal

Quick start

build

for gpu

./build_gpu.sh

The program compiled in this way supports both CPU and GPU. You can use the -g parameter to switch between them.

for cpu

./build_cpu.sh

If you don't have a CUDA environment, you can also try the CPU version. Note that this version is extremely slow and is only intended for comparing and verifying the correctness of the GPU version.

for mac gpu

Metal is now supported, and the GPU on Mac can be used now.

My MacBook hardware and software information

Chip : Apple M1
OS Version : 15.5 (24F74)

./build_mac_gpu.sh

for mac cpu

./build_mac_cpu.sh

Translation

training

Align the training data volume (512 pairs) of Chapter 11 Transformer in d2l.

$ time ./transformer -e 30
corpus : ./resources/fra_preprocessed_512.txt
epochs : 30
batch_size : 128
gpu : 1
learning rate : 0.001
checkpoint :
enc_vocab_size : 195
dec_vocab_size : 214
bos_id : 3
eos_id : 1
src_pad_id : 0
tgt_pad_id : 0
predicting : false
batch_size : 128
epoch 0 :  [512/512]loss : 4.62015
epoch 1 :  [512/512]loss : 3.39543
epoch 2 :  [512/512]loss : 2.96776
epoch 3 :  [512/512]loss : 2.45226
epoch 4 :  [512/512]loss : 2.20506
epoch 5 :  [512/512]loss : 1.94157
epoch 6 :  [512/512]loss : 1.76016
epoch 7 :  [512/512]loss : 1.58783
epoch 8 :  [512/512]loss : 1.46
epoch 9 :  [512/512]loss : 1.35267
epoch 10 :  [512/512]loss : 1.23456
epoch 11 :  [512/512]loss : 1.11818
epoch 12 :  [512/512]loss : 1.02721
epoch 13 :  [512/512]loss : 0.930991
epoch 14 :  [512/512]loss : 0.868043
epoch 15 :  [512/512]loss : 0.797028
epoch 16 :  [512/512]loss : 0.730525
epoch 17 :  [512/512]loss : 0.685426
epoch 18 :  [512/512]loss : 0.670126
epoch 19 :  [512/512]loss : 0.635286
epoch 20 :  [512/512]loss : 0.580065
epoch 21 :  [512/512]loss : 0.558903
epoch 22 :  [512/512]loss : 0.528207
epoch 23 :  [512/512]loss : 0.49648
epoch 24 :  [512/512]loss : 0.482626
epoch 25 :  [512/512]loss : 0.456417
epoch 26 :  [512/512]loss : 0.452462
epoch 27 :  [512/512]loss : 0.432102
epoch 28 :  [512/512]loss : 0.408004
epoch 29 :  [512/512]loss : 0.395327
checkpoint saved : ./checkpoints/checkpoint_20250603_111836_29.bin

real    0m44.835s
user    0m44.531s
sys     0m0.272s

inference

Perform translation inference using the checkpoint file generated earlier. The data will be read from the test.txt file.

$ ./transformer -e 0 -c ./checkpoints/checkpoint_20250603_111836_29.bin
corpus : ./resources/fra_preprocessed_512.txt
epochs : 0
batch_size : 128
gpu : 1
learning rate : 0.001
checkpoint : ./checkpoints/checkpoint_20250603_111836_29.bin
enc_vocab_size : 195
dec_vocab_size : 214
bos_id : 3
eos_id : 1
src_pad_id : 0
tgt_pad_id : 0
predicting : true
batch_size : 1
loading from checkpoint : ./checkpoints/checkpoint_20250603_111836_29.bin
loaded from checkpoint
serving mode
test file : ./test.txt
go . -> va .
i lost . -> j'ai perdu .
he's calm . -> il est mouillé .
i'm home . -> je suis chez moi .

Language Model

A language model built with a two-layer decoder, trained on the first 256 tokens from timemachine_preprocessed.txt, reads the text starting from test_lm.txt during inference.

training

$ ./lm -e 10 -m 256
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 10
batch_size : 16
gpu : 1
learning rate : 0.001
checkpoint : 
max_words_cnt : 256
Allocating memory  
for tensors : 36609236 bytes, 
for c_tensors: 3194706328 bytes 
for grad_tensors: 1241779004 bytes
epoch 0 :  [224/256]loss : 5.54111
epoch 1 :  [224/256]loss : 1.36544
epoch 2 :  [224/256]loss : 0.178868
epoch 3 :  [224/256]loss : 0.0472531
epoch 4 :  [224/256]loss : 0.0245251
epoch 5 :  [224/256]loss : 0.0195127
epoch 6 :  [224/256]loss : 0.0174135
epoch 7 :  [224/256]loss : 0.0162055
epoch 8 :  [224/256]loss : 0.0154597
epoch 9 :  [224/256]loss : 0.0147902
checkpoint saved : ./checkpoints/checkpoint_20250608_200259_9.bin

inference

$ ./lm -e 0 -c ./checkpoints/checkpoint_20250608_200259_9.bin
corpus : ./resources/time_machine/timemachine_preprocessed.txt
epochs : 0
batch_size : 16
gpu : 1
learning rate : 0.001
checkpoint : ./checkpoints/checkpoint_20250608_200259_9.bin
max_words_cnt : 256
Allocating memory  
for tensors : 36355416 bytes, 
for c_tensors: 17206900 bytes 
for grad_tensors: 14209596 bytes
loading from checkpoint : ./checkpoints/checkpoint_20250608_200259_9.bin
loaded from checkpoint
serving mode
test file : ./test_lm.txt
sentence : the time machine
by h g wells i the time traveller for so it will be convenient to speak of him was expounding a recondite matter to us his grey eyes shone and twinkled and his usually pale face was flushed and animated the fire burned brightly and animated the fire burned brightly 
-----------------

pre-trained lm model

This model was trained for 8 epochs using the full text of The Time Machine novel.

handwritten_recognition

To verify some functions more quickly, I have introduced a handwritten digit recognition program.

./handwritten_recognition 
images magic : 2051
label magic : 2049
lables_num : 60000
data loaded.
Actions:
...
evaluating :  [10000/10000] correct : 9501
epoch : 9 [50000/50000] loss : 0.150985
evaluating :  [10000/10000] correct : 9493

graphviz supported

You can add a line of code like this to the program to output an out.dot file that records the tensor computation topology. For example, in mnist.cpp:

printAllActions();
printDotGraph(); // here
allocMemAndInitTensors();

If you have Graphviz installed, you can use the following command to convert the out.dot file into a PNG image:

dot -Tpng out.dot -o out.png

Here's an example from my side where a PNG file is generated as output.

legacy version

v1

Derivation of backpropagation gradient formulas

Derivation

Name		Name	Last commit message	Last commit date
Latest commit History 263 Commits
.github/workflows		.github/workflows
.vscode		.vscode
backends		backends
checkpoints		checkpoints
dataloaders		dataloaders
doc/equations		doc/equations
graph		graph
model		model
module		module
optimizers		optimizers
resources		resources
tensor		tensor
tools		tools
vocab		vocab
.gitignore		.gitignore
README.md		README.md
build_cpu.sh		build_cpu.sh
build_gpu.sh		build_gpu.sh
build_mac_cpu.sh		build_mac_cpu.sh
build_mac_gpu.sh		build_mac_gpu.sh
checkpoint.cpp		checkpoint.cpp
checkpoint.h		checkpoint.h
common.cpp		common.cpp
common.h		common.h
handwritten_recognition_topo.png		handwritten_recognition_topo.png
lm.cpp		lm.cpp
log.md		log.md
makefile		makefile
mnist.cpp		mnist.cpp
perf.sh		perf.sh
sighandler.cpp		sighandler.cpp
test.cpp		test.cpp
test.txt		test.txt
test_lm.txt		test_lm.txt
transformer.cpp		transformer.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cpp-transformer

Project Highlights

Principle - Oriented

Automatic Differentiation

Low Dependencies

Update Log

Quick start

build

for gpu

for cpu

for mac gpu

for mac cpu

Translation

training

inference

Language Model

training

inference

pre-trained lm model

handwritten_recognition

graphviz supported

legacy version

Derivation of backpropagation gradient formulas

Reference Materials

About

Uh oh!

Releases 3

Packages

Contributors 2

Uh oh!

Languages

freelw/cpp-transformer

Folders and files

Latest commit

History

Repository files navigation

cpp-transformer

Project Highlights

Principle - Oriented

Automatic Differentiation

Low Dependencies

Update Log

Quick start

build

for gpu

for cpu

for mac gpu

for mac cpu

Translation

training

inference

Language Model

training

inference

pre-trained lm model

handwritten_recognition

graphviz supported

legacy version

Derivation of backpropagation gradient formulas

Reference Materials

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Uh oh!

Languages

Packages