Name	Name	Last commit message	Last commit date
parent directory ..
data	data
results	results
scripts	scripts
src	src
.env.example	.env.example
README.md	README.md
package.json	package.json

TOON Benchmarks

Benchmarks measuring TOON's token efficiency and retrieval accuracy compared to JSON, XML, YAML, and CSV.

Note

Results are automatically embedded in the main README. This guide focuses on running the benchmarks locally.

Quick Start

# Run token efficiency benchmark
pnpm benchmark:token-efficiency

# Run retrieval accuracy benchmark (requires API keys)
pnpm benchmark:accuracy

Token Efficiency Benchmark

Measures token count reduction across JSON, XML, YAML, CSV, and TOON:

Generate datasets (GitHub repos, analytics, orders)
Convert to all formats (TOON, JSON, XML, YAML, CSV)
Tokenize using gpt-tokenizer (o200k_base encoding)
Calculate savings and generate report

pnpm benchmark:token-efficiency

Results are saved to results/token-efficiency.md.

Retrieval Accuracy Benchmark

Tests how well LLMs can answer questions about data in different formats (TOON, JSON, JSON compact, XML, YAML, CSV):

Generate ~150-160 questions across 4 datasets
Convert each dataset to all 6 formats
Query each LLM with formatted data + question
Validate answers using gpt-5-nano as judge
Aggregate metrics and generate report

Setup

Edit src/evaluate.ts and add models to the exported models array:

export const models: LanguageModelV2[] = [
  openai('gpt-5-nano'),
  anthropic('claude-haiku-4-5-20251001'),
  google('gemini-2.5-flash'),
  xai('grok-4-fast-non-reasoning'),
  // Add your models here
]

Duplicate .env.example to .env and add your API keys:
```
cp .env.example .env
```

Usage

# Full benchmark
pnpm benchmark:accuracy

# Dry run (10 questions only, for testing setup)
DRY_RUN=true pnpm benchmark:accuracy

Running the script will:

Prompt you to select which models to test.
Skip models with existing results (rerun to overwrite).
Show progress with rate limiting.
Save results to results/accuracy/models/{model-id}.json.
Generate report at results/retrieval-accuracy.md.

Configuration

Edit src/constants.ts to adjust:

MODEL_RPM_LIMITS – Rate limits per model
DEFAULT_CONCURRENCY – Parallel tasks (default: 10)
DRY_RUN_LIMITS – Questions per dry run (default: 10)

Project Structure

scripts/
├── accuracy-benchmark.ts         # Retrieval accuracy benchmark
├── token-efficiency-benchmark.ts # Token counting benchmark
└── fetch-github-repos.ts         # Update GitHub dataset
src/
├── constants.ts                  # Configuration
├── datasets.ts                   # Test data generators
├── evaluate.ts                   # LLM evaluation
├── formatters.ts                 # Format converters
├── questions.ts                  # Question generation
├── report.ts                     # Markdown reports
├── storage.ts                    # Result caching
└── utils.ts                      # Helpers
data/
└── github-repos.json             # Top 100 GitHub repos
results/
├── token-efficiency.md           # Token savings report
├── retrieval-accuracy.md         # Accuracy report
└── accuracy/models/              # Per-model results (JSON)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

TOON Benchmarks

Quick Start

Token Efficiency Benchmark

Retrieval Accuracy Benchmark

Setup

Usage

Configuration

Project Structure

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

TOON Benchmarks

Quick Start

Token Efficiency Benchmark

Retrieval Accuracy Benchmark

Setup

Usage

Configuration

Project Structure