Recursive Language Model patterns for Claude Code — handle massive contexts (10M+ tokens) by treating them as external variables.
Based on: https://arxiv.org/html/2512.24601v1
You don't call RLM tools directly. You ask Claude to analyze large files, and Claude uses RLM behind the scenes.
Example:
- You say: "Analyze this 2MB log file for errors"
- Claude uses RLM tools internally
- You get: "I found 3 error patterns: database timeouts (47), auth failures (23)..."
Instead of feeding massive contexts directly into the LLM:
- Load context as external variable (stays out of prompt)
- Inspect structure programmatically
- Chunk strategically (lines, chars, or paragraphs)
- Sub-query recursively on chunks
- Aggregate results for final synthesis
git clone https://github.com/richardwhiteii/rlm.git
cd rlm
uv syncOr with pip:
python -m venv .venv
source .venv/bin/activate
pip install -e .Option 1: Quick Setup (recommended)
# From the rlm directory
claude mcp add rlm -s user -- uv run --directory "$(pwd)" python -m src.rlm_mcp_serverThis adds RLM globally (-s user) so it's available in all your Claude Code sessions.
Option 2: With Ollama (free local inference)
First set environment variables, then add:
export RLM_DATA_DIR="$HOME/.rlm-data"
export OLLAMA_URL="http://localhost:11434"
claude mcp add rlm -s user -- uv run --directory "$(pwd)" python -m src.rlm_mcp_serverOption 3: Manual JSON config
Add to ~/.claude/.mcp.json for full control:
{
"mcpServers": {
"rlm": {
"command": "uv",
"args": ["run", "--directory", "/path/to/rlm", "python", "-m", "src.rlm_mcp_server"],
"env": {
"RLM_DATA_DIR": "/path/to/.rlm-data",
"OLLAMA_URL": "http://localhost:11434"
}
}
}
}Note: Replace
/path/to/rlmwith your actual installation path (runpwdin the rlm directory).
Enable Claude to use RLM tools automatically without manual invocation:
1. CLAUDE.md Integration
Copy CLAUDE.md.example content to your project's CLAUDE.md (or ~/.claude/CLAUDE.md for global) to teach Claude when to reach for RLM tools automatically.
2. Hook Installation
Copy the .claude/hooks/ directory to your project to auto-suggest RLM when reading files >25KB:
cp -r .claude/hooks/ /Users/your_username/your-project/.claude/hooks/The hook provides guidance but doesn't block reads.
3. Skill Reference
Copy the .claude/skills/ directory for comprehensive RLM guidance:
cp -r .claude/skills/ /Users/your_username/your-project/.claude/skills/With these in place, Claude will autonomously detect when to use RLM instead of reading large files directly into context.
These tools are used by Claude internally when processing large contexts. You don't call them directly—you just ask Claude to analyze large files.
| Tool | Purpose |
|---|---|
rlm_auto_analyze |
One-step analysis — auto-detects type, chunks, and queries |
rlm_load_context |
Load context as external variable |
rlm_inspect_context |
Get structure info without loading into prompt |
rlm_chunk_context |
Chunk by lines/chars/paragraphs |
rlm_get_chunk |
Retrieve specific chunk |
rlm_filter_context |
Filter with regex (keep/remove matching lines) |
rlm_exec |
Execute Python code against loaded context (sandboxed) |
rlm_sub_query |
Make sub-LLM call on chunk |
rlm_sub_query_batch |
Process multiple chunks in parallel |
rlm_store_result |
Store sub-call result for aggregation |
rlm_get_results |
Retrieve stored results |
rlm_list_contexts |
List all loaded contexts |
For most use cases, Claude uses rlm_auto_analyze — it handles everything automatically:
rlm_auto_analyze(
name="my_file",
content=file_content,
goal="find_bugs" # or: summarize, extract_structure, security_audit, answer:<question>
)What it does automatically:
- Detects content type (Python, JSON, Markdown, logs, prose, code)
- Selects optimal chunking strategy
- Adapts the query for the content type
- Runs parallel sub-queries
- Returns aggregated results
Supported goals:
| Goal | Description |
|---|---|
summarize |
Summarize content purpose and key points |
find_bugs |
Identify errors, issues, potential problems |
extract_structure |
List functions, classes, schema, headings |
security_audit |
Find vulnerabilities and security issues |
answer:<question> |
Answer a custom question about the content |
For deterministic pattern matching and data extraction, Claude can use rlm_exec to run Python code directly against a loaded context. This is closer to the paper's REPL approach and provides full control over analysis logic.
Tool: rlm_exec
Purpose: Execute arbitrary Python code against a loaded context in a sandboxed subprocess.
Parameters:
code(required): Python code to execute. Set theresultvariable to capture output.context_name(required): Name of a previously loaded context.timeout(optional, default 30): Maximum execution time in seconds.
Features:
- Context available as read-only
contextvariable - Pre-imported modules:
re,json,collections - Subprocess isolation (won't crash the server)
- Timeout enforcement
- Works on any system with Python (no Docker needed)
Example — Finding patterns in a loaded context:
# After loading a context
rlm_exec(
code="""
import re
amounts = re.findall(r'\$[\d,]+', context)
result = {'count': len(amounts), 'sample': amounts[:5]}
""",
context_name="bill"
)Example Response:
{
"result": {
"count": 1247,
"sample": ["$500", "$1,000", "$250,000", "$100,000", "$50"]
},
"stdout": "",
"stderr": "",
"return_code": 0,
"timed_out": false
}Example — Extracting structured data:
rlm_exec(
code="""
import re
import json
# Find all email addresses
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', context)
# Count by domain
from collections import Counter
domains = [e.split('@')[1] for e in emails]
domain_counts = Counter(domains)
result = {
'total_emails': len(emails),
'unique_domains': len(domain_counts),
'top_domains': domain_counts.most_common(5)
}
""",
context_name="dataset",
timeout=60
)When to use rlm_exec vs rlm_sub_query:
| Use Case | Tool | Why |
|---|---|---|
| Extract all dates, IDs, amounts | rlm_exec |
Regex is deterministic and fast |
| Find security vulnerabilities | rlm_sub_query |
Requires reasoning and context |
| Parse JSON/XML structure | rlm_exec |
Standard libraries work perfectly |
| Summarize themes or tone | rlm_sub_query |
Natural language understanding needed |
| Count word frequencies | rlm_exec |
Simple computation, no AI needed |
| Answer "Why did X happen?" | rlm_sub_query |
Requires inference and reasoning |
Tip: For large contexts, combine both — use rlm_exec to filter/extract, then rlm_sub_query for semantic analysis of filtered results.
By default, sub-queries use Claude Haiku 4.5 via the Claude Agent SDK. This works out-of-the-box if you have a Claude API key configured.
| Provider | Default Model | Cost | Use Case |
|---|---|---|---|
claude-sdk |
claude-haiku-4-5 | ~$0.80/1M input | Default, works everywhere |
ollama |
olmo-3.1:32b | $0 | Local inference, requires Ollama |
The rlm_sub_query and rlm_sub_query_batch tools support hierarchical decomposition via the max_depth parameter:
max_depth=0(default): Flat call, no recursionmax_depth=1-5: Sub-LLM can use RLM tools (chunk, filter, sub_query, etc.)
Example: Analyzing a massive codebase with 2-level recursion:
rlm_sub_query(
query="Find all security vulnerabilities",
context_name="codebase",
chunk_index=0,
max_depth=2 # Allow sub-queries to further decompose
)How it works:
- When
max_depth > 0, the sub-LLM receives RLM tools in its function calling context - If the sub-LLM decides to use a tool (e.g.,
rlm_chunk_context), the agent loop handles it - Each recursive call decrements the depth limit until
max_depthis reached - The response includes recursion metadata:
depth_reachedandcall_trace
Recommended for recursive calls: Use a local model like gemma3:27b via Ollama to avoid cost escalation from deep recursion.
With Ollama installed locally, you can run sub-queries at zero cost:
-
Install Ollama and pull a model:
ollama pull gemma3:27b
-
Add Ollama URL to your MCP config:
{ "mcpServers": { "rlm": { "command": "uv", "args": ["run", "--directory", "/Users/your_username/projects/rlm", "python", "-m", "src.rlm_mcp_server"], "env": { "RLM_DATA_DIR": "/Users/your_username/.rlm-data", "OLLAMA_URL": "http://localhost:11434" } } } } -
Specify provider in sub-queries:
rlm_sub_query( query="Summarize this section", context_name="my_doc", chunk_index=0, provider="ollama" # Use local Ollama instead of default claude-sdk )
Or for batch processing:
rlm_sub_query_batch(
query="Extract key points",
context_name="my_doc",
chunk_indices=[0, 1, 2, 3],
provider="ollama", # Use local Ollama instead of default claude-sdk
concurrency=4
)
# 1. Load a large document
rlm_load_context(name="report", content=<large document>)
# 2. Inspect structure
rlm_inspect_context(name="report", preview_chars=500)
# 3. Chunk into manageable pieces
rlm_chunk_context(name="report", strategy="paragraphs", size=1)
# 4. Sub-query chunks in parallel
rlm_sub_query_batch(
query="What is the main topic? Reply in one sentence.",
context_name="report",
chunk_indices=[0, 1, 2, 3],
concurrency=4 # uses claude-sdk by default
)
# 5. Store results for aggregation
rlm_store_result(name="topics", result=<response>)
# 6. Retrieve all results
rlm_get_results(name="topics")
The flagship example of RLM capabilities — processing the Encyclopedia Britannica, 11th Edition from Project Gutenberg:
# Load the full encyclopedia (11MB, ~2M tokens)
content = open("docs/encyclopedia/merged_encyclopedia.txt").read()
rlm_load_context(name="encyclopedia", content=content)
# Inspect
rlm_inspect_context(name="encyclopedia")
# → 11MB, 184K lines, ~2M tokens
# Chunk for processing
rlm_chunk_context(name="encyclopedia", strategy="paragraphs", size=30)
# Query across the corpus
rlm_sub_query_batch(
query="Summarize the main topics in this section",
context_name="encyclopedia",
chunk_indices=[0, 50, 100, 150],
provider="claude-sdk" # or "ollama" for free local inference
)Extract Topic Catalog:
# Filter for specific subject
rlm_filter_context(
name="encyclopedia",
output_name="botany",
pattern="(?i)(botan|plant|flora|flower|genus)",
mode="keep"
)
# Analyze filtered content
rlm_auto_analyze(
name="botany_analysis",
content=filtered_content,
goal="answer:List all botanical articles with brief descriptions"
)| Metric | Value |
|---|---|
| File size | 11 MB |
| Lines | 184,000 |
| Tokens | ~2M |
| Processing cost | $0 (Ollama) or ~$1.60 (Haiku) |
$RLM_DATA_DIR/
├── contexts/ # Raw contexts (.txt + .meta.json)
├── chunks/ # Chunked versions (by context name)
└── results/ # Stored sub-call results (.jsonl)
Contexts persist across sessions. Chunked contexts are cached for reuse.
Claude Code
│
▼
RLM MCP Server
│
├─► claude-sdk (Haiku 4.5) ─► Anthropic API
│
└─► ollama ─► Local LLM (gemma3:27b, llama3, etc.)
The key insight: context stays external, not in your prompt. Claude orchestrates; sub-models analyze.
Use these prompts with Claude Code to explore the codebase and learn RLM patterns. The code is the single source of truth.
Read src/rlm_mcp_server.py and list all RLM tools with their parameters and purpose.
Explain the chunking strategies available in rlm_chunk_context.
When would I use each one?
What's the difference between rlm_sub_query and rlm_sub_query_batch?
Show me the implementation.
Read src/rlm_mcp_server.py and explain how contexts are stored and persisted.
Where does the data live?
How does the claude-sdk provider extract text from responses?
Walk me through _call_claude_sdk.
What happens when I call rlm_load_context? Trace the full flow.
Load the README as a context, chunk it by paragraphs,
and run a sub-query on the first chunk to summarize it.
Show me how to process a large file in parallel using rlm_sub_query_batch.
Use a real example.
I have a 1MB log file. Walk me through the RLM pattern to extract all errors.
Read the test file and explain what scenarios are covered.
What edge cases should I be aware of?
How would I add a new chunking strategy (e.g., by regex delimiter)?
Show me where to modify the code.
How would I add a new provider (e.g., OpenAI)?
What functions need to change?
The repository includes excerpts from the Encyclopedia Britannica, 11th Edition (1910-1911) from Project Gutenberg for testing RLM capabilities on large reference documents.
| File | Size | Description |
|---|---|---|
docs/encyclopedia/merged_encyclopedia.txt |
11MB | All slices merged (~2M tokens) |
# Load the full encyclopedia
content = open("docs/encyclopedia/merged_encyclopedia.txt").read()
rlm_load_context(name="encyclopedia", content=content)
# Inspect
rlm_inspect_context(name="encyclopedia")
# → 11MB, 184K lines, ~2M tokens
# Chunk for processing
rlm_chunk_context(name="encyclopedia", strategy="paragraphs", size=30)
# Query across the corpus
rlm_sub_query_batch(
query="Summarize the main topics in this section",
context_name="encyclopedia",
chunk_indices=[0, 50, 100, 150],
provider="claude-sdk" # or "ollama" for free local inference
)# Filter for specific subject
rlm_filter_context(
name="encyclopedia",
output_name="botany",
pattern="(?i)(botan|plant|flora|flower|genus)",
mode="keep"
)
# Analyze filtered content
rlm_auto_analyze(
name="botany_analysis",
content=filtered_content,
goal="answer:List all botanical articles with brief descriptions"
)Additional encyclopedia volumes available from Project Gutenberg:
- Search: https://www.gutenberg.org/ebooks/search/?query=encyclopaedia+britannica+11th
- ~130 slices available covering A-Z
Encyclopedia Britannica, 11th Edition (1910-1911) sourced from Project Gutenberg. Public domain.
MIT