A verifiable C compiler bootstrapped from a minimal, human-auditable seed. This project solves the "trusting trust" problem by ensuring every stage of the bootstrap chain is small enough for manual verification.
Stage 0 Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
┌────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Hex │──▶│ Minimal │──▶│Extended │──▶│ Subset │──▶│ C89 │──▶│ C99 │
│ Loader │ │ Forth │ │ Forth │ │ C │ │Compiler │ │Compiler │
│ (C) │ │ (asm) │ │ (Forth) │ │ (Forth) │ │ (C) │ │ (C) │
└────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
Status: Verified Working ✓
The bootstrap pipeline produces ARM64 assembly (.s files) which you then assemble into executables:
$ ./bootstrap.sh
SUCCESS: Valid assembly generated.
$ cat output.s
.global _main
.align 4
_main:
mov w0, #0x0000002a ; return 42
...
ret
# Assemble and link the output
$ as -arch arm64 -o output.o output.s
$ ld -arch arm64 -e _main -o output output.o -lSystem -L$(xcrun --show-sdk-path)/usr/lib
$ ./output; echo $?
42The bootstrap pipeline chains stages together via stdin/stdout:
┌─────────────────────────────────────────────────────────────────────┐
│ cat stage1.hex | stage0 ──▶ (Forth interpreter now running) │
│ │ │
│ ├──▶ reads stage2/forth.fth (extends Forth) │
│ ├──▶ reads stage3/cc.fth (C compiler in Forth) │
│ └──▶ reads hello.c ──▶ outputs ARM64 assembly │
└─────────────────────────────────────────────────────────────────────┘
- Stage 0 reads hex-encoded Stage 1 binary, loads it into executable memory, jumps to it
- Stage 1 (Forth) reads and interprets Stage 2 Forth extensions
- Stage 2 adds control flow words (IF/THEN/ELSE, loops)
- Stage 3 (C compiler in Forth) reads C source, emits ARM64 assembly
The entire chain runs as a single pipeline with no intermediate files.
- macOS with Apple Silicon (ARM64) - generates Mach-O binaries, not ELF
- Xcode Command Line Tools (
xcode-select --install)
The compiler outputs ARM64 assembly using macOS conventions (_main entry point, Darwin syscalls). Linux/ELF support would require modifying Stage 3.
# Build all stages
make
# Run all tests
make test
# Full bootstrap with verification
./bootstrap.sh# Stage 0: Hex Loader
make stage0
# Stage 1: Minimal Forth
make stage1
# Stage 2: Extended Forth
make stage2
# Stage 3: Subset C Compiler
make stage3
# Stage 4: C89 Compiler
make stage4
# Stage 5: C99 Compiler
make stage5sectorc/
├── stage0/ # Hex loader (trust anchor)
│ ├── stage0.c # C implementation (108 lines)
│ └── stage0.s # ARM64 assembly reference
├── stage1/ # Minimal Forth interpreter
│ ├── forth.c # C reference implementation
│ └── forth.s # ARM64 assembly (converted to hex for bootstrap)
├── stage2/ # Extended Forth
│ ├── forth.c # Host implementation
│ └── forth.fth # Forth source (76 lines)
├── stage3/ # Subset C compiler
│ └── cc.fth # C compiler in Forth (1,163 lines)
├── stage4/ # C89 compiler
│ └── cc.c # Full C89 implementation
├── stage5/ # C99 compiler
│ └── cc.c # C99 extensions
├── tools/ # Build utilities
│ └── macho_to_hex.sh
├── tests/ # Test suites for each stage
├── bootstrap.sh # Full bootstrap with verification
├── Makefile # Build system
│
│ Generated artifacts:
├── stage1.hex # Stage 1 binary in hex format (7.7 KB)
├── output.s # Compiled ARM64 assembly output
└── manifest.txt # SHA256 hashes for reproducibility
Reads ASCII hexadecimal from stdin, converts to binary, executes it.
Features:
- Whitespace handling (spaces, tabs, newlines)
- Comment support (# and ;)
- Case-insensitive hex digits
- JIT execution on ARM64 macOS
A direct-threaded Forth interpreter with ~65 primitive words.
Features:
- Stack operations (DUP, DROP, SWAP, OVER, ROT)
- Arithmetic (+, -, *, /, MOD)
- Comparison (<, >, =)
- Memory access (@, !, C@, C!)
- I/O (EMIT, KEY, .)
Written in Forth, loaded by Stage 1. Adds higher-level words needed for compiler construction.
Features:
- Control flow: IF/THEN/ELSE, BEGIN/UNTIL/AGAIN
- Stack operations: NIP, TUCK, ?DUP, ROT, 2DROP, 2DUP
- Compilation helpers: [COMPILE]
- I/O utilities: SPACE, CR
- Comments: \ (backslash comments)
Compiles a C subset to ARM64 assembly (.s files). You assemble the output with:
as -arch arm64 -o prog.o prog.s
ld -arch arm64 -e _main -o prog prog.o -lSystem -L$(xcrun --show-sdk-path)/usr/libSupported C subset:
- Types: int, pointers, arrays
- Statements: if/else, while, for, return
- Expressions: arithmetic, comparison, assignment
- Function definitions and calls (incl. recursion)
Bootstrappable Stage 3 (stage3/cc.fth):
- Runs on Stage 1 + Stage 2 and compiles
tests/stage3/*.cto ARM64 assembly (used bybootstrap.sh). - The host
stage3/ccbinary is currently a convenience wrapper around the Stage 4 implementation.
Full C89 implementation.
Additional features:
- struct, union, enum
- switch/case
- typedef
- Basic preprocessor (#define, #include)
C99 extensions (in progress).
The bootstrap script generates manifest.txt with SHA256 hashes of all artifacts:
# Run full bootstrap with verification
./bootstrap.sh
# Check manifest
cat manifest.txt
# Sectorc Verification Manifest
# Generated: 2025-12-18 10:57:00 UTC
# 5b7264fe... stage0/stage0
# b3dc0dac... stage1/forth
# ...
# Verify reproducibility (re-run and compare hashes)
./bootstrap.sh
shasum -a 256 -c manifest.txtThe generated output.s can be manually inspected to verify no malicious code injection.
# Run all tests
make test
# Run individual stage tests
cd tests/stage0 && ./run_tests.sh
cd tests/stage1 && ./run_tests.sh
cd tests/stage2 && ./run_tests.sh
cd tests/stage3 && ./run_tests.sh
cd tests/stage4 && ./run_tests.shThe entire bootstrap chain source is small enough for manual verification:
| Stage | Source | Size | Lines | Audit Time |
|---|---|---|---|---|
| Stage 0 | stage0/stage0.c |
2.9 KB | 108 | ~2 hours |
| Stage 1 | stage1/forth.s |
20.2 KB | 947 | ~8 hours |
| Stage 2 | stage2/forth.fth |
2.1 KB | 76 | ~1 hour |
| Stage 3 | stage3/cc.fth |
23.6 KB | 1,163 | ~12 hours |
| Total | 48.8 KB | 2,294 | ~23 hours |
Stages 4-5 (C89/C99 compilers) are larger but can be machine-verified against their predecessors.
This project addresses the "trusting trust" problem identified by Ken Thompson in 1984:
- Minimal trust anchor: Stage 0 is 108 lines of C, fully auditable in under 2 hours
- Chain of trust: Each stage is compiled/interpreted by the previous trusted stage
- No external dependencies: The chain builds from hex → working C compiler
- Reproducibility:
manifest.txtcontains SHA256 hashes of all artifacts - Transparent output: Generated assembly is human-readable
Trust assumptions:
- Your CPU executes documented instructions correctly
- Your disassembler is accurate
- Auditors are competent and not colluding
MIT License
- Ken Thompson, "Reflections on Trusting Trust" (1984)
- Bootstrappable Builds
- stage0