Skip to content

matiaspakua/microgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

MicroGPT

A minimal, dependency-free GPT implementation in pure Python that trains a transformer-based language model and generates text.

Core Algorithm Overview

MicroGPT is a bare-bones transformer with these key components:

  1. Autograd Engine (Value class): Implements backpropagation via computational graph traversal
  2. Tokenizer: Maps characters to token IDs (0 to vocab_size-1)
  3. Transformer: Single-layer GPT-2-like architecture with:
    • Token & position embeddings
    • Multi-head self-attention (4 heads)
    • Feed-forward MLP layers
    • RMSNorm (layer normalization)
  4. Training: Adam optimizer over 1000 steps to minimize next-token prediction loss
  5. Inference: Samples new sequences token-by-token using temperature-controlled sampling

How to Run

Basic Usage

  1. Prepare input data in input.txt (one item per line):
emma
olivia
ava
  1. Run training & inference:
python3 microgpt.py

The script automatically trains for 1000 steps and generates 20 samples.

Example Input & Output

Input (10 names):

emma, olivia, ava, sophia, isabella, mia, charlotte, amelia, harper, evelyn

Training:

  • Starts with high loss (~2.8) and decreases to ~0.35
  • Model learns to predict next characters in names

Generated Output:

sample 1: charlotte
sample 2: sophia
sample 3: evelyn
sample 4: ava
...

The model learns character patterns and generates realistic-sounding names from the training data.

Algorithm Breakdown

Step What Happens
Tokenization Characters → IDs (BOS token marks sequence start/end)
Forward Pass Token → embedding → attention → MLP → next-token logits
Loss Cross-entropy loss between predicted & actual next token
Backward Gradients flow through computation graph via chain rule
Optimizer Adam updates all parameters with learning rate decay
Inference Sample from softmax distribution (with temperature control)

Configuration

Key hyperparameters in the code:

  • n_embd = 16: Embedding dimension
  • n_head = 4: Number of attention heads
  • n_layer = 1: Number of transformer layers
  • block_size = 16: Maximum sequence length
  • num_steps = 1000: Training iterations
  • temperature = 0.5: Sampling creativity (0-1, lower = more deterministic)

No Dependencies

Uses only Python standard library: os, math, random.


Attribution: Core algorithm by @karpathy

About

POC of the microGPT by Karpathy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages