Skip to content

artvandelay/agentic-backtesting

Repository files navigation

NLBT — Natural Language Backtesting

Turn plain English into professional backtesting reports in minutes.
Describe your trading strategy in natural language. Get Python code, backtest results, and professional reports. No coding required.

🆕 v0.3.0: Now powered by 8 LLM-driven intelligence features with multilingual support and dramatically improved performance!


🚀 Quick Start

# 1. Install
git clone https://github.com/yourusername/nlbt && cd nlbt
pip install -e .

# 2. Configure LLM
llm keys set openrouter
llm models default openrouter/anthropic/claude-3.5-sonnet

# 3. Run
nlbt

Try it: Type "Buy and hold AAPL in 2024 with $10,000" and press enter.


✨ Key Benefits

Feature Benefit
💬 Natural Language Describe strategies in plain English - no coding needed
🧠 LLM-Powered Intelligence 8 AI features: smart extraction, validation, multilingual reports
🌍 Multilingual Support Generate reports in any language (Spanish, Hindi, etc.)
High Performance Dramatically improved strategy execution (up to 24x better returns)
🔄 Self-Correcting Auto-retries with intelligent error diagnosis
📊 Professional Reports Markdown + PDF with metrics, charts, and full code
🔧 Clean Architecture LLM-first design with 20% less code, more intelligence

🚀 What's New in v0.3.0

Major Architecture Overhaul: Complete "extreme promptification" with 8 LLM-powered intelligence features:

🧠 LLM-Powered Features

  • Smart Title Generation: Dynamic, context-aware report titles
  • Intelligent Requirement Extraction: Structured parsing from natural language
  • Flexible User Intent Detection: Understands "yes", "go", "proceed" variations
  • Adaptive Result Validation: Evaluates backtest quality intelligently
  • Multilingual Section Naming: Localized headings for any language
  • Smart Column Detection: Automatically finds best DataFrame columns
  • Dynamic Clarification Limits: Stops asking when enough info gathered
  • Targeted Error Diagnosis: Analyzes errors and suggests specific fixes

📈 Performance Impact

Real-world example: Same NVDA RSI strategy

  • Before v0.3.0: 10% return (1 trade)
  • After v0.3.0: 240% return (multiple trades)
  • 24x improvement in strategy execution quality

🏗️ Architecture Improvements

  • 20% less code: Removed 311 lines of redundant logic
  • LLM-first design: Intelligent reasoning replaces hardcoded rules
  • Clean fallbacks: Simple backups instead of complex regex patterns
  • Zero breaking changes: Seamless upgrade path

📥 What You Get

Input → Output

You type:

"NVDA RSI strategy: buy when RSI drops below 30 with larger positions when RSI is lower, sell when RSI goes above 70, use 2023 data with $50000 capital"

You get (in reports/NVDA_2023_<timestamp>/):

📁 See actual example: reports/EXAMPLE_NVDA_2023/
📄 View report: report.md | report.pdf
💻 View code: strategy.py

📊 Professional Report (report.md / report.pdf)

# NVDA 2023 Trading Strategy

Initial Capital: $50,000 → Final Equity: $55,039.29 → Gain: +$5,039.29 (+10.08%)

## Summary
- Test Period: 2023-01-03 to 2023-12-29 (360 days)
- Strategy: RSI Mean Reversion with Dynamic Position Sizing
- Total Return: 10.08% vs Buy & Hold 158.14%
- Risk Metrics: Sharpe 1.54, Max Drawdown -2.80%

## Strategy Implementation
- Entry: Buy when RSI < 30 with position scaling
- Position Size: Larger positions when RSI is lower (1x to 2x)
- Exit: Sell when RSI > 70
- Risk Management: 95% max equity exposure

## Performance Metrics
- Alpha: 7.80% (significant outperformance vs risk)
- Beta: 0.01439 (low market correlation)
- Calmar Ratio: 3.63 (excellent risk-adjusted returns)
- Win Rate: 100% (1 successful trade)

[Full analysis with code implementation]

💻 Executable Code (strategy.py)

# Generated by NLBT - NVDA RSI Strategy with Dynamic Position Sizing

from backtesting import Backtest, Strategy
import numpy as np
import pandas as pd

def RSI(array, n=14):
    """Helper for RSI calculation"""
    delta = pd.Series(array).diff()
    gain = (delta.where(delta > 0, 0)).rolling(n).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(n).mean()
    rs = gain / loss
    return (100 - (100 / (1 + rs))).to_numpy()

class MyStrategy(Strategy):
    def init(self):
        self.rsi = self.I(RSI, self.data.Close, 14)
    
    def next(self):
        if not self.position:
            if self.rsi[-1] < 30:
                # Dynamic position sizing based on RSI
                rsi_scale = (30 - self.rsi[-1]) / 30  # 0 to 1 scale
                position_size = 1 + rsi_scale  # 1x to 2x sizing
                
                units = int((self.equity * 0.95 * position_size) / self.data.Close[-1])
                if units > 0:
                    self.buy(size=units)
        
        elif self.position and self.rsi[-1] > 70:
            self.position.close()

# Execute backtest
data = get_ohlcv_data('NVDA', '2023-01-01', '2023-12-31')
bt = Backtest(data, MyStrategy, cash=50000)
stats = bt.run()

🔍 Debug & Agent Logs

  • debug.log - Execution trace for troubleshooting
  • agent.log - Full LLM context for iteration (~6-8K words)

⚠️ Important Notes

  • Safety: This tool runs AI-generated Python code locally. Use in trusted environments only.
  • Status: Functional for single-ticker strategies. APIs may change without notice.
  • Limitations: Multi-asset portfolios not yet supported. Works best with clear strategy descriptions.

Requirements

  • Python 3.8+
  • OpenRouter account (recommended) or OpenAI/Anthropic
  • 5 minutes for setup

Install & Setup

1. Clone and install everything

git clone https://github.com/yourusername/nlbt
cd nlbt
pip install -e .

This installs all dependencies including llm CLI, backtesting, ta, and more

2. Set up OpenRouter (recommended)

Why OpenRouter? Cost control, multiple models, spending limits

  1. Create account: Go to https://openrouter.ai/
  2. Get API key: Click "Keys" → "Create Key"
  3. Add credits: Add $5-10 (you'll use <$1 for examples)
  4. Set spending limit: Optional but recommended
  5. Configure locally:
llm keys set openrouter
# Paste your API key when prompted

llm models default openrouter/anthropic/claude-3.5-sonnet

3. Quick test

nlbt

Try: "Buy and hold AAPL in 2024 with $1000"

What you should see:

  • Agent asks clarifying questions (if needed)
  • Shows "Phase 1 - Understanding" → "Phase 2 - Implementation" → "Phase 3 - Reporting"
  • Saves report to reports/<TICKER>_<PERIOD>_<TIMESTAMP>/report.md (+ PDF)
  • Takes 2-3 minutes total

💬 Usage

nlbt                    # Start interactive session

In-chat commands:

  • info - Show current phase and requirements
  • debug - Show internal state
  • lucky - Quick demo with AAPL
  • exit - Quit

Language preference

  • Set report language: Include lang <language> or language: <language> anywhere in your message to generate the entire report (including the TL;DR) in that language. Defaults to English if omitted.

Example:

💭 You: Buy and hold AAPL in 2024 with $10,000; lang Spanish

🔄 How It Works

NLBT uses a 3-phase agentic workflow with automatic error recovery:

Simple Overview

  1. 🔍 Understanding - Chat with AI to gather requirements (ticker, period, capital, strategy)
  2. ⚙️ Implementation - AI generates Python code, tests it, and auto-retries if needed
  3. 📊 Reporting - AI creates professional analysis with metrics and insights

Visual Workflow

Click to see detailed architecture diagram

Color Key:

  • Purple = User actions | Yellow = LLM actions | Green = System/sandbox
  • Orange = Decisions | Teal = Phase states | Gray = Outputs
graph TD
    Start([User describes strategy]) --> P1[Phase 1: Understanding]
    P1 --> Extract[Extract requirements from conversation]
    Extract --> Check{Complete &<br/>implementable?}
    
    Check -->|Missing/unclear| Ask[Ask clarifying questions]
    Ask --> P1
    
    Check -->|Complete & valid| Ready[Ready to Implement]
    Ready --> Present[Present plan to user]
    Present --> Response{User response}
    
    Response -->|Anything else| BackToP1[Return to understanding]
    BackToP1 --> P1
    Response -->|Yes/Go| P2[Phase 2: Implementation]
    
    P2 --> Plan[Plan: LLM creates implementation plan]
    Plan --> Code[Producer: Generate Python code]
    Code --> Test[Test: Validate syntax & imports]
    Test --> Execute[Execute: Run in sandbox]
    Execute --> Critic[Critic: Evaluate results]
    Critic --> Decision{Critic decision}
    
    Decision -->|PASS| P3[Phase 3: Reporting]
    Decision -->|RETRY| Count{Attempt < 3?}
    Count -->|Yes| Plan
    Count -->|No| FailBack[Show error & return to understanding]
    FailBack --> P1
    
    P3 --> ReportPlan[Plan: Structure report]
    ReportPlan --> Write[Write: Generate markdown]
    Write --> Refine[Refine: Polish & save]
    Refine --> Done([Report saved])

    %% Role-based styling
    classDef user fill:#d1c4e9,stroke:#7e57c2,color:#4a148c;
    classDef llm fill:#fff9c4,stroke:#fbc02d,color:#6d4c41;
    classDef system fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
    classDef decision fill:#ffccbc,stroke:#e64a19,color:#bf360c;
    classDef userInput fill:#e1bee7,stroke:#8e24aa,color:#4a148c;
    classDef state fill:#b2dfdb,stroke:#00897b,color:#004d40;
    classDef output fill:#eceff1,stroke:#90a4ae,color:#37474f;

    %% Assign roles
    class Start user;
    class P1,P2,P3,Ready state;
    class Extract,Ask,Plan,Code,Critic,ReportPlan,Write,Refine llm;
    class Test,Execute,Present system;
    class Check,Decision,Count decision;
    class Response userInput;
    class Done output;
    class BackToP1,FailBack system;
Loading

Key Features

  • Smart Confirmation: Say "yes" to proceed, anything else returns to conversation
  • Auto-Retry: Up to 3 attempts with error feedback
  • Error Recovery: After failures, returns to chat with error context
  • Producer-Critic Pattern: Separate AI for generation and evaluation (reduces bias)

Examples (realistic conversations)

Simple Buy & Hold

💭 You: Buy and hold AAPL in 2024 with $10,000

🤖 Perfect! I have everything needed:
   • Ticker: AAPL
   • Period: 2024
   • Capital: $10,000
   • Strategy: Buy and hold
   
   Ready to implement? (yes/no/change/explain)

💭 You: yes

🤖 ⚙️ Phase 2: Implementation
   [Attempt 1/3] Planning implementation...
   [Attempt 1/3] Generating Python code...
   [Attempt 1/3] Testing code...
   [Attempt 1/3] Executing backtest...
   [Attempt 1/3] Results: Return 38.88%, Sharpe 1.25, Max DD -15.26%
   ✅ Critic: PASS - Proceeding to reporting

🤖 📊 Phase 3: Reporting
   Planning report structure...
   Writing comprehensive analysis...
   ✅ Report saved: reports/AAPL_2024_20241002_123456/report.md

RSI Strategy

💭 You: NVDA with RSI: buy when RSI < 30, sell when RSI > 70, 2023, $20K

🤖 STATUS: READY
   • Ticker: NVDA
   • Period: 2023  
   • Capital: $20,000
   • Strategy: RSI mean reversion (buy <30, sell >70)
   
   Proceed? (yes/no/change/explain)

💭 You: yes

🤖 [Generates → Tests → ✓ PASS]
   ✅ Report: reports/NVDA_2023_20241002_143012/report.md

Progressive Disclosure (provide info gradually)

💭 You: Test a moving average crossover

🤖 I'll help you backtest a moving average strategy! I need:
   • Which ticker? (e.g., AAPL, SPY, TSLA)
   • Time period? (e.g., 2024, 2020-2023)
   • Initial capital? (e.g., $10,000)
   • Which MA periods? (e.g., 50/200 day)

💭 You: SPY, 2024, $25K

🤖 Great! Which moving average periods? (e.g., 20/50, 50/200)

💭 You: use 50/200 day crossover

🤖 Perfect! All set. Ready to proceed?

💭 You: yes

🔧 Troubleshooting

Common Issues & Solutions

"Unknown model" error

llm models list                    # See available models
llm models default [model-name]    # Set default

"LLM failed" or timeout

  • Check API key: llm keys list
  • Check OpenRouter credits/limits
  • Try simpler strategy description
  • Use debug command to see internal state

"No data found" error

  • Verify ticker symbol (use Yahoo Finance format)
  • Ensure date range is in the past
  • Try different dates or ticker

Code execution fails

  • Agent will auto-retry up to 3 times
  • If still failing, simplify your strategy
  • Use info to see what requirements were gathered
  • Check for typos in ticker/dates

General debugging

  • Use info command to see current phase
  • Use debug command to see conversation history
  • Check reports/ folder for any partial outputs
  • Restart with exit and try again
Alternative LLM Providers

OpenAI:

llm keys set openai
llm models default gpt-4o-mini

Anthropic:

llm keys set anthropic  
llm models default claude-3-5-sonnet-20241022

🤝 Contributing

Contributions welcome! Areas of interest:

  • Multi-asset portfolio backtesting
  • Additional technical indicators
  • Parameter optimization
  • Risk management strategies
  • Interactive visualizations

See issues or open a PR!


📄 License

GPL-3.0 License. See LICENSE.

This is copyleft software - any derivative works must also be open source under GPL-3.0.


🏗️ Technical Details

Project Structure
src/nlbt/
├── cli.py              # Interactive CLI with rich formatting
├── reflection.py       # 3-phase reflection engine
├── llm.py              # LLM wrapper using `llm` CLI
└── sandbox.py          # Safe code execution

reports/                # Generated backtest reports
├── <TICKER>_<PERIOD>_<TIMESTAMP>/
│   ├── report.md       # User: Professional report
│   ├── report.pdf      # User: PDF version
│   ├── strategy.py     # Developer: Executable code
│   ├── debug.log       # Developer: Execution trace
│   └── agent.log       # Agent: Full LLM context
└── EXAMPLE_*/          # Sample outputs

tests/                  # Unit and integration tests
Architecture & Design Patterns

This project implements several Agentic Design Patterns:

  • Reflection Pattern: 3-phase autonomous workflow with LLM controlling transitions
  • Producer-Critic Pattern: Separate models for generation and evaluation (avoids confirmation bias)
  • Planning Pattern: Phase 2 plans before coding; Phase 3 plans before writing
  • Tool Use Pattern: Sandbox execution, data fetching, indicator calculations
  • Prompt Chaining: Phase transitions chain prompts with context
  • Error Recovery: Auto-retry loop (max 3 attempts) with error feedback
  • Checkpoint Pattern: Three-tier output (user/developer/agent) for reproducibility

See cursor_chats/Agentic_Design_Patterns_Complete.md for detailed documentation.

About

Tell me your trading strategy in your words, and I'll evaluate it for you

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published