MicroGPT

A minimal, dependency-free GPT implementation in pure Python that trains a transformer-based language model and generates text.

Core Algorithm Overview

MicroGPT is a bare-bones transformer with these key components:

Autograd Engine (Value class): Implements backpropagation via computational graph traversal
Tokenizer: Maps characters to token IDs (0 to vocab_size-1)
Transformer: Single-layer GPT-2-like architecture with:
- Token & position embeddings
- Multi-head self-attention (4 heads)
- Feed-forward MLP layers
- RMSNorm (layer normalization)
Training: Adam optimizer over 1000 steps to minimize next-token prediction loss
Inference: Samples new sequences token-by-token using temperature-controlled sampling

How to Run

Basic Usage

Prepare input data in input.txt (one item per line):

emma
olivia
ava

Run training & inference:

python3 microgpt.py

The script automatically trains for 1000 steps and generates 20 samples.

Example Input & Output

Input (10 names):

emma, olivia, ava, sophia, isabella, mia, charlotte, amelia, harper, evelyn

Training:

Starts with high loss (~2.8) and decreases to ~0.35
Model learns to predict next characters in names

Generated Output:

sample 1: charlotte
sample 2: sophia
sample 3: evelyn
sample 4: ava
...

The model learns character patterns and generates realistic-sounding names from the training data.

Algorithm Breakdown

Step	What Happens
Tokenization	Characters → IDs (BOS token marks sequence start/end)
Forward Pass	Token → embedding → attention → MLP → next-token logits
Loss	Cross-entropy loss between predicted & actual next token
Backward	Gradients flow through computation graph via chain rule
Optimizer	Adam updates all parameters with learning rate decay
Inference	Sample from softmax distribution (with temperature control)

Configuration

Key hyperparameters in the code:

n_embd = 16: Embedding dimension
n_head = 4: Number of attention heads
n_layer = 1: Number of transformer layers
block_size = 16: Maximum sequence length
num_steps = 1000: Training iterations
temperature = 0.5: Sampling creativity (0-1, lower = more deterministic)

No Dependencies

Uses only Python standard library: os, math, random.

Attribution: Core algorithm by @karpathy

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
input.txt		input.txt
microgpt.py		microgpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroGPT

Core Algorithm Overview

How to Run

Basic Usage

Example Input & Output

Algorithm Breakdown

Configuration

No Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MicroGPT

Core Algorithm Overview

How to Run

Basic Usage

Example Input & Output

Algorithm Breakdown

Configuration

No Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages