NLBT — Natural Language Backtesting

Turn plain English into professional backtesting reports in minutes.
Describe your trading strategy in natural language. Get Python code, backtest results, and professional reports. No coding required.

🆕 v0.3.0: Now powered by 8 LLM-driven intelligence features with multilingual support and dramatically improved performance!

🚀 Quick Start

# 1. Install
git clone https://github.com/yourusername/nlbt && cd nlbt
pip install -e .

# 2. Configure LLM
llm keys set openrouter
llm models default openrouter/anthropic/claude-3.5-sonnet

# 3. Run
nlbt

Try it: Type "Buy and hold AAPL in 2024 with $10,000" and press enter.

✨ Key Benefits

Feature	Benefit
💬 Natural Language	Describe strategies in plain English - no coding needed
🧠 LLM-Powered Intelligence	8 AI features: smart extraction, validation, multilingual reports
🌍 Multilingual Support	Generate reports in any language (Spanish, Hindi, etc.)
⚡ High Performance	Dramatically improved strategy execution (up to 24x better returns)
🔄 Self-Correcting	Auto-retries with intelligent error diagnosis
📊 Professional Reports	Markdown + PDF with metrics, charts, and full code
🔧 Clean Architecture	LLM-first design with 20% less code, more intelligence

🚀 What's New in v0.3.0

Major Architecture Overhaul: Complete "extreme promptification" with 8 LLM-powered intelligence features:

🧠 LLM-Powered Features

Smart Title Generation: Dynamic, context-aware report titles
Intelligent Requirement Extraction: Structured parsing from natural language
Flexible User Intent Detection: Understands "yes", "go", "proceed" variations
Adaptive Result Validation: Evaluates backtest quality intelligently
Multilingual Section Naming: Localized headings for any language
Smart Column Detection: Automatically finds best DataFrame columns
Dynamic Clarification Limits: Stops asking when enough info gathered
Targeted Error Diagnosis: Analyzes errors and suggests specific fixes

📈 Performance Impact

Real-world example: Same NVDA RSI strategy

Before v0.3.0: 10% return (1 trade)
After v0.3.0: 240% return (multiple trades)
24x improvement in strategy execution quality

🏗️ Architecture Improvements

20% less code: Removed 311 lines of redundant logic
LLM-first design: Intelligent reasoning replaces hardcoded rules
Clean fallbacks: Simple backups instead of complex regex patterns
Zero breaking changes: Seamless upgrade path

📥 What You Get

Input → Output

You type:

"NVDA RSI strategy: buy when RSI drops below 30 with larger positions when RSI is lower, sell when RSI goes above 70, use 2023 data with $50000 capital"

You get (in reports/NVDA_2023_<timestamp>/):

📁 See actual example: reports/EXAMPLE_NVDA_2023/
📄 View report: report.md | report.pdf
💻 View code: strategy.py

📊 Professional Report (`report.md` / `report.pdf`)

# NVDA 2023 Trading Strategy

Initial Capital: $50,000 → Final Equity: $55,039.29 → Gain: +$5,039.29 (+10.08%)

## Summary
- Test Period: 2023-01-03 to 2023-12-29 (360 days)
- Strategy: RSI Mean Reversion with Dynamic Position Sizing
- Total Return: 10.08% vs Buy & Hold 158.14%
- Risk Metrics: Sharpe 1.54, Max Drawdown -2.80%

## Strategy Implementation
- Entry: Buy when RSI < 30 with position scaling
- Position Size: Larger positions when RSI is lower (1x to 2x)
- Exit: Sell when RSI > 70
- Risk Management: 95% max equity exposure

## Performance Metrics
- Alpha: 7.80% (significant outperformance vs risk)
- Beta: 0.01439 (low market correlation)
- Calmar Ratio: 3.63 (excellent risk-adjusted returns)
- Win Rate: 100% (1 successful trade)

[Full analysis with code implementation]

💻 Executable Code (`strategy.py`)

# Generated by NLBT - NVDA RSI Strategy with Dynamic Position Sizing

from backtesting import Backtest, Strategy
import numpy as np
import pandas as pd

def RSI(array, n=14):
    """Helper for RSI calculation"""
    delta = pd.Series(array).diff()
    gain = (delta.where(delta > 0, 0)).rolling(n).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(n).mean()
    rs = gain / loss
    return (100 - (100 / (1 + rs))).to_numpy()

class MyStrategy(Strategy):
    def init(self):
        self.rsi = self.I(RSI, self.data.Close, 14)
    
    def next(self):
        if not self.position:
            if self.rsi[-1] < 30:
                # Dynamic position sizing based on RSI
                rsi_scale = (30 - self.rsi[-1]) / 30  # 0 to 1 scale
                position_size = 1 + rsi_scale  # 1x to 2x sizing
                
                units = int((self.equity * 0.95 * position_size) / self.data.Close[-1])
                if units > 0:
                    self.buy(size=units)
        
        elif self.position and self.rsi[-1] > 70:
            self.position.close()

# Execute backtest
data = get_ohlcv_data('NVDA', '2023-01-01', '2023-12-31')
bt = Backtest(data, MyStrategy, cash=50000)
stats = bt.run()

🔍 Debug & Agent Logs

debug.log - Execution trace for troubleshooting
agent.log - Full LLM context for iteration (~6-8K words)

⚠️ Important Notes

Safety: This tool runs AI-generated Python code locally. Use in trusted environments only.
Status: Functional for single-ticker strategies. APIs may change without notice.
Limitations: Multi-asset portfolios not yet supported. Works best with clear strategy descriptions.

Requirements

Python 3.8+
OpenRouter account (recommended) or OpenAI/Anthropic
5 minutes for setup

Install & Setup

1. Clone and install everything

git clone https://github.com/yourusername/nlbt
cd nlbt
pip install -e .

This installs all dependencies including llm CLI, backtesting, ta, and more

2. Set up OpenRouter (recommended)

Why OpenRouter? Cost control, multiple models, spending limits

Create account: Go to https://openrouter.ai/
Get API key: Click "Keys" → "Create Key"
Add credits: Add $5-10 (you'll use <$1 for examples)
Set spending limit: Optional but recommended
Configure locally:

llm keys set openrouter
# Paste your API key when prompted

llm models default openrouter/anthropic/claude-3.5-sonnet

3. Quick test

nlbt

Try: "Buy and hold AAPL in 2024 with $1000"

What you should see:

Agent asks clarifying questions (if needed)
Shows "Phase 1 - Understanding" → "Phase 2 - Implementation" → "Phase 3 - Reporting"
Saves report to reports/<TICKER>_<PERIOD>_<TIMESTAMP>/report.md (+ PDF)
Takes 2-3 minutes total

💬 Usage

nlbt                    # Start interactive session

In-chat commands:

info - Show current phase and requirements
debug - Show internal state
lucky - Quick demo with AAPL
exit - Quit

Language preference

Set report language: Include lang <language> or language: <language> anywhere in your message to generate the entire report (including the TL;DR) in that language. Defaults to English if omitted.

Example:

💭 You: Buy and hold AAPL in 2024 with $10,000; lang Spanish

🔄 How It Works

NLBT uses a 3-phase agentic workflow with automatic error recovery:

Simple Overview

🔍 Understanding - Chat with AI to gather requirements (ticker, period, capital, strategy)
⚙️ Implementation - AI generates Python code, tests it, and auto-retries if needed
📊 Reporting - AI creates professional analysis with metrics and insights

Visual Workflow

Click to see detailed architecture diagram

Color Key:

Purple = User actions | Yellow = LLM actions | Green = System/sandbox
Orange = Decisions | Teal = Phase states | Gray = Outputs

graph TD
    Start([User describes strategy]) --> P1[Phase 1: Understanding]
    P1 --> Extract[Extract requirements from conversation]
    Extract --> Check{Complete &<br/>implementable?}
    
    Check -->|Missing/unclear| Ask[Ask clarifying questions]
    Ask --> P1
    
    Check -->|Complete & valid| Ready[Ready to Implement]
    Ready --> Present[Present plan to user]
    Present --> Response{User response}
    
    Response -->|Anything else| BackToP1[Return to understanding]
    BackToP1 --> P1
    Response -->|Yes/Go| P2[Phase 2: Implementation]
    
    P2 --> Plan[Plan: LLM creates implementation plan]
    Plan --> Code[Producer: Generate Python code]
    Code --> Test[Test: Validate syntax & imports]
    Test --> Execute[Execute: Run in sandbox]
    Execute --> Critic[Critic: Evaluate results]
    Critic --> Decision{Critic decision}
    
    Decision -->|PASS| P3[Phase 3: Reporting]
    Decision -->|RETRY| Count{Attempt < 3?}
    Count -->|Yes| Plan
    Count -->|No| FailBack[Show error & return to understanding]
    FailBack --> P1
    
    P3 --> ReportPlan[Plan: Structure report]
    ReportPlan --> Write[Write: Generate markdown]
    Write --> Refine[Refine: Polish & save]
    Refine --> Done([Report saved])

    %% Role-based styling
    classDef user fill:#d1c4e9,stroke:#7e57c2,color:#4a148c;
    classDef llm fill:#fff9c4,stroke:#fbc02d,color:#6d4c41;
    classDef system fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
    classDef decision fill:#ffccbc,stroke:#e64a19,color:#bf360c;
    classDef userInput fill:#e1bee7,stroke:#8e24aa,color:#4a148c;
    classDef state fill:#b2dfdb,stroke:#00897b,color:#004d40;
    classDef output fill:#eceff1,stroke:#90a4ae,color:#37474f;

    %% Assign roles
    class Start user;
    class P1,P2,P3,Ready state;
    class Extract,Ask,Plan,Code,Critic,ReportPlan,Write,Refine llm;
    class Test,Execute,Present system;
    class Check,Decision,Count decision;
    class Response userInput;
    class Done output;
    class BackToP1,FailBack system;

Key Features

Smart Confirmation: Say "yes" to proceed, anything else returns to conversation
Auto-Retry: Up to 3 attempts with error feedback
Error Recovery: After failures, returns to chat with error context
Producer-Critic Pattern: Separate AI for generation and evaluation (reduces bias)

Examples (realistic conversations)

Simple Buy & Hold

💭 You: Buy and hold AAPL in 2024 with $10,000

🤖 Perfect! I have everything needed:
   • Ticker: AAPL
   • Period: 2024
   • Capital: $10,000
   • Strategy: Buy and hold
   
   Ready to implement? (yes/no/change/explain)

💭 You: yes

🤖 ⚙️ Phase 2: Implementation
   [Attempt 1/3] Planning implementation...
   [Attempt 1/3] Generating Python code...
   [Attempt 1/3] Testing code...
   [Attempt 1/3] Executing backtest...
   [Attempt 1/3] Results: Return 38.88%, Sharpe 1.25, Max DD -15.26%
   ✅ Critic: PASS - Proceeding to reporting

🤖 📊 Phase 3: Reporting
   Planning report structure...
   Writing comprehensive analysis...
   ✅ Report saved: reports/AAPL_2024_20241002_123456/report.md

RSI Strategy

💭 You: NVDA with RSI: buy when RSI < 30, sell when RSI > 70, 2023, $20K

🤖 STATUS: READY
   • Ticker: NVDA
   • Period: 2023  
   • Capital: $20,000
   • Strategy: RSI mean reversion (buy <30, sell >70)
   
   Proceed? (yes/no/change/explain)

💭 You: yes

🤖 [Generates → Tests → ✓ PASS]
   ✅ Report: reports/NVDA_2023_20241002_143012/report.md

Progressive Disclosure (provide info gradually)

💭 You: Test a moving average crossover

🤖 I'll help you backtest a moving average strategy! I need:
   • Which ticker? (e.g., AAPL, SPY, TSLA)
   • Time period? (e.g., 2024, 2020-2023)
   • Initial capital? (e.g., $10,000)
   • Which MA periods? (e.g., 50/200 day)

💭 You: SPY, 2024, $25K

🤖 Great! Which moving average periods? (e.g., 20/50, 50/200)

💭 You: use 50/200 day crossover

🤖 Perfect! All set. Ready to proceed?

💭 You: yes

🔧 Troubleshooting

Common Issues & Solutions

"Unknown model" error

llm models list                    # See available models
llm models default [model-name]    # Set default

"LLM failed" or timeout

Check API key: llm keys list
Check OpenRouter credits/limits
Try simpler strategy description
Use debug command to see internal state

"No data found" error

Verify ticker symbol (use Yahoo Finance format)
Ensure date range is in the past
Try different dates or ticker

Code execution fails

Agent will auto-retry up to 3 times
If still failing, simplify your strategy
Use info to see what requirements were gathered
Check for typos in ticker/dates

General debugging

Use info command to see current phase
Use debug command to see conversation history
Check reports/ folder for any partial outputs
Restart with exit and try again

Alternative LLM Providers

OpenAI:

llm keys set openai
llm models default gpt-4o-mini

Anthropic:

llm keys set anthropic  
llm models default claude-3-5-sonnet-20241022

🤝 Contributing

Contributions welcome! Areas of interest:

Multi-asset portfolio backtesting
Additional technical indicators
Parameter optimization
Risk management strategies
Interactive visualizations

See issues or open a PR!

📄 License

GPL-3.0 License. See LICENSE.

This is copyleft software - any derivative works must also be open source under GPL-3.0.

🏗️ Technical Details

Project Structure

src/nlbt/
├── cli.py              # Interactive CLI with rich formatting
├── reflection.py       # 3-phase reflection engine
├── llm.py              # LLM wrapper using `llm` CLI
└── sandbox.py          # Safe code execution

reports/                # Generated backtest reports
├── <TICKER>_<PERIOD>_<TIMESTAMP>/
│   ├── report.md       # User: Professional report
│   ├── report.pdf      # User: PDF version
│   ├── strategy.py     # Developer: Executable code
│   ├── debug.log       # Developer: Execution trace
│   └── agent.log       # Agent: Full LLM context
└── EXAMPLE_*/          # Sample outputs

tests/                  # Unit and integration tests

Architecture & Design Patterns

This project implements several Agentic Design Patterns:

Reflection Pattern: 3-phase autonomous workflow with LLM controlling transitions
Producer-Critic Pattern: Separate models for generation and evaluation (avoids confirmation bias)
Planning Pattern: Phase 2 plans before coding; Phase 3 plans before writing
Tool Use Pattern: Sandbox execution, data fetching, indicator calculations
Prompt Chaining: Phase transitions chain prompts with context
Error Recovery: Auto-retry loop (max 3 attempts) with error feedback
Checkpoint Pattern: Three-tier output (user/developer/agent) for reproducibility

See cursor_chats/Agentic_Design_Patterns_Complete.md for detailed documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
cursor_chats		cursor_chats
scripts		scripts
src/nlbt		src/nlbt
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
REPO_NOTES.md		REPO_NOTES.md
env.example		env.example
pyproject.toml		pyproject.toml

License

artvandelay/agentic-backtesting

Folders and files

Latest commit

History

Repository files navigation