Turn plain English into professional backtesting reports in minutes.
Describe your trading strategy in natural language. Get Python code, backtest results, and professional reports. No coding required.
🆕 v0.3.0: Now powered by 8 LLM-driven intelligence features with multilingual support and dramatically improved performance!
# 1. Install
git clone https://github.com/yourusername/nlbt && cd nlbt
pip install -e .
# 2. Configure LLM
llm keys set openrouter
llm models default openrouter/anthropic/claude-3.5-sonnet
# 3. Run
nlbtTry it: Type "Buy and hold AAPL in 2024 with $10,000" and press enter.
| Feature | Benefit |
|---|---|
| 💬 Natural Language | Describe strategies in plain English - no coding needed |
| 🧠 LLM-Powered Intelligence | 8 AI features: smart extraction, validation, multilingual reports |
| 🌍 Multilingual Support | Generate reports in any language (Spanish, Hindi, etc.) |
| ⚡ High Performance | Dramatically improved strategy execution (up to 24x better returns) |
| 🔄 Self-Correcting | Auto-retries with intelligent error diagnosis |
| 📊 Professional Reports | Markdown + PDF with metrics, charts, and full code |
| 🔧 Clean Architecture | LLM-first design with 20% less code, more intelligence |
Major Architecture Overhaul: Complete "extreme promptification" with 8 LLM-powered intelligence features:
- Smart Title Generation: Dynamic, context-aware report titles
- Intelligent Requirement Extraction: Structured parsing from natural language
- Flexible User Intent Detection: Understands "yes", "go", "proceed" variations
- Adaptive Result Validation: Evaluates backtest quality intelligently
- Multilingual Section Naming: Localized headings for any language
- Smart Column Detection: Automatically finds best DataFrame columns
- Dynamic Clarification Limits: Stops asking when enough info gathered
- Targeted Error Diagnosis: Analyzes errors and suggests specific fixes
Real-world example: Same NVDA RSI strategy
- Before v0.3.0: 10% return (1 trade)
- After v0.3.0: 240% return (multiple trades)
- 24x improvement in strategy execution quality
- 20% less code: Removed 311 lines of redundant logic
- LLM-first design: Intelligent reasoning replaces hardcoded rules
- Clean fallbacks: Simple backups instead of complex regex patterns
- Zero breaking changes: Seamless upgrade path
You type:
"NVDA RSI strategy: buy when RSI drops below 30 with larger positions when RSI is lower, sell when RSI goes above 70, use 2023 data with $50000 capital"
You get (in reports/NVDA_2023_<timestamp>/):
📁 See actual example:
reports/EXAMPLE_NVDA_2023/
📄 View report:report.md|report.pdf
💻 View code:strategy.py
# NVDA 2023 Trading Strategy
Initial Capital: $50,000 → Final Equity: $55,039.29 → Gain: +$5,039.29 (+10.08%)
## Summary
- Test Period: 2023-01-03 to 2023-12-29 (360 days)
- Strategy: RSI Mean Reversion with Dynamic Position Sizing
- Total Return: 10.08% vs Buy & Hold 158.14%
- Risk Metrics: Sharpe 1.54, Max Drawdown -2.80%
## Strategy Implementation
- Entry: Buy when RSI < 30 with position scaling
- Position Size: Larger positions when RSI is lower (1x to 2x)
- Exit: Sell when RSI > 70
- Risk Management: 95% max equity exposure
## Performance Metrics
- Alpha: 7.80% (significant outperformance vs risk)
- Beta: 0.01439 (low market correlation)
- Calmar Ratio: 3.63 (excellent risk-adjusted returns)
- Win Rate: 100% (1 successful trade)
[Full analysis with code implementation]# Generated by NLBT - NVDA RSI Strategy with Dynamic Position Sizing
from backtesting import Backtest, Strategy
import numpy as np
import pandas as pd
def RSI(array, n=14):
"""Helper for RSI calculation"""
delta = pd.Series(array).diff()
gain = (delta.where(delta > 0, 0)).rolling(n).mean()
loss = (-delta.where(delta < 0, 0)).rolling(n).mean()
rs = gain / loss
return (100 - (100 / (1 + rs))).to_numpy()
class MyStrategy(Strategy):
def init(self):
self.rsi = self.I(RSI, self.data.Close, 14)
def next(self):
if not self.position:
if self.rsi[-1] < 30:
# Dynamic position sizing based on RSI
rsi_scale = (30 - self.rsi[-1]) / 30 # 0 to 1 scale
position_size = 1 + rsi_scale # 1x to 2x sizing
units = int((self.equity * 0.95 * position_size) / self.data.Close[-1])
if units > 0:
self.buy(size=units)
elif self.position and self.rsi[-1] > 70:
self.position.close()
# Execute backtest
data = get_ohlcv_data('NVDA', '2023-01-01', '2023-12-31')
bt = Backtest(data, MyStrategy, cash=50000)
stats = bt.run()debug.log- Execution trace for troubleshootingagent.log- Full LLM context for iteration (~6-8K words)
- Safety: This tool runs AI-generated Python code locally. Use in trusted environments only.
- Status: Functional for single-ticker strategies. APIs may change without notice.
- Limitations: Multi-asset portfolios not yet supported. Works best with clear strategy descriptions.
- Python 3.8+
- OpenRouter account (recommended) or OpenAI/Anthropic
- 5 minutes for setup
git clone https://github.com/yourusername/nlbt
cd nlbt
pip install -e .This installs all dependencies including llm CLI, backtesting, ta, and more
Why OpenRouter? Cost control, multiple models, spending limits
- Create account: Go to https://openrouter.ai/
- Get API key: Click "Keys" → "Create Key"
- Add credits: Add $5-10 (you'll use <$1 for examples)
- Set spending limit: Optional but recommended
- Configure locally:
llm keys set openrouter
# Paste your API key when prompted
llm models default openrouter/anthropic/claude-3.5-sonnetnlbtTry: "Buy and hold AAPL in 2024 with $1000"
What you should see:
- Agent asks clarifying questions (if needed)
- Shows "Phase 1 - Understanding" → "Phase 2 - Implementation" → "Phase 3 - Reporting"
- Saves report to
reports/<TICKER>_<PERIOD>_<TIMESTAMP>/report.md(+ PDF) - Takes 2-3 minutes total
nlbt # Start interactive sessionIn-chat commands:
info- Show current phase and requirementsdebug- Show internal statelucky- Quick demo with AAPLexit- Quit
- Set report language: Include
lang <language>orlanguage: <language>anywhere in your message to generate the entire report (including the TL;DR) in that language. Defaults to English if omitted.
Example:
💭 You: Buy and hold AAPL in 2024 with $10,000; lang SpanishNLBT uses a 3-phase agentic workflow with automatic error recovery:
- 🔍 Understanding - Chat with AI to gather requirements (ticker, period, capital, strategy)
- ⚙️ Implementation - AI generates Python code, tests it, and auto-retries if needed
- 📊 Reporting - AI creates professional analysis with metrics and insights
Click to see detailed architecture diagram
Color Key:
- Purple = User actions | Yellow = LLM actions | Green = System/sandbox
- Orange = Decisions | Teal = Phase states | Gray = Outputs
graph TD
Start([User describes strategy]) --> P1[Phase 1: Understanding]
P1 --> Extract[Extract requirements from conversation]
Extract --> Check{Complete &<br/>implementable?}
Check -->|Missing/unclear| Ask[Ask clarifying questions]
Ask --> P1
Check -->|Complete & valid| Ready[Ready to Implement]
Ready --> Present[Present plan to user]
Present --> Response{User response}
Response -->|Anything else| BackToP1[Return to understanding]
BackToP1 --> P1
Response -->|Yes/Go| P2[Phase 2: Implementation]
P2 --> Plan[Plan: LLM creates implementation plan]
Plan --> Code[Producer: Generate Python code]
Code --> Test[Test: Validate syntax & imports]
Test --> Execute[Execute: Run in sandbox]
Execute --> Critic[Critic: Evaluate results]
Critic --> Decision{Critic decision}
Decision -->|PASS| P3[Phase 3: Reporting]
Decision -->|RETRY| Count{Attempt < 3?}
Count -->|Yes| Plan
Count -->|No| FailBack[Show error & return to understanding]
FailBack --> P1
P3 --> ReportPlan[Plan: Structure report]
ReportPlan --> Write[Write: Generate markdown]
Write --> Refine[Refine: Polish & save]
Refine --> Done([Report saved])
%% Role-based styling
classDef user fill:#d1c4e9,stroke:#7e57c2,color:#4a148c;
classDef llm fill:#fff9c4,stroke:#fbc02d,color:#6d4c41;
classDef system fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
classDef decision fill:#ffccbc,stroke:#e64a19,color:#bf360c;
classDef userInput fill:#e1bee7,stroke:#8e24aa,color:#4a148c;
classDef state fill:#b2dfdb,stroke:#00897b,color:#004d40;
classDef output fill:#eceff1,stroke:#90a4ae,color:#37474f;
%% Assign roles
class Start user;
class P1,P2,P3,Ready state;
class Extract,Ask,Plan,Code,Critic,ReportPlan,Write,Refine llm;
class Test,Execute,Present system;
class Check,Decision,Count decision;
class Response userInput;
class Done output;
class BackToP1,FailBack system;
- Smart Confirmation: Say "yes" to proceed, anything else returns to conversation
- Auto-Retry: Up to 3 attempts with error feedback
- Error Recovery: After failures, returns to chat with error context
- Producer-Critic Pattern: Separate AI for generation and evaluation (reduces bias)
Simple Buy & Hold
💭 You: Buy and hold AAPL in 2024 with $10,000
🤖 Perfect! I have everything needed:
• Ticker: AAPL
• Period: 2024
• Capital: $10,000
• Strategy: Buy and hold
Ready to implement? (yes/no/change/explain)
💭 You: yes
🤖 ⚙️ Phase 2: Implementation
[Attempt 1/3] Planning implementation...
[Attempt 1/3] Generating Python code...
[Attempt 1/3] Testing code...
[Attempt 1/3] Executing backtest...
[Attempt 1/3] Results: Return 38.88%, Sharpe 1.25, Max DD -15.26%
✅ Critic: PASS - Proceeding to reporting
🤖 📊 Phase 3: Reporting
Planning report structure...
Writing comprehensive analysis...
✅ Report saved: reports/AAPL_2024_20241002_123456/report.mdRSI Strategy
💭 You: NVDA with RSI: buy when RSI < 30, sell when RSI > 70, 2023, $20K
🤖 STATUS: READY
• Ticker: NVDA
• Period: 2023
• Capital: $20,000
• Strategy: RSI mean reversion (buy <30, sell >70)
Proceed? (yes/no/change/explain)
💭 You: yes
🤖 [Generates → Tests → ✓ PASS]
✅ Report: reports/NVDA_2023_20241002_143012/report.mdProgressive Disclosure (provide info gradually)
💭 You: Test a moving average crossover
🤖 I'll help you backtest a moving average strategy! I need:
• Which ticker? (e.g., AAPL, SPY, TSLA)
• Time period? (e.g., 2024, 2020-2023)
• Initial capital? (e.g., $10,000)
• Which MA periods? (e.g., 50/200 day)
💭 You: SPY, 2024, $25K
🤖 Great! Which moving average periods? (e.g., 20/50, 50/200)
💭 You: use 50/200 day crossover
🤖 Perfect! All set. Ready to proceed?
💭 You: yesCommon Issues & Solutions
llm models list # See available models
llm models default [model-name] # Set default- Check API key:
llm keys list - Check OpenRouter credits/limits
- Try simpler strategy description
- Use
debugcommand to see internal state
- Verify ticker symbol (use Yahoo Finance format)
- Ensure date range is in the past
- Try different dates or ticker
- Agent will auto-retry up to 3 times
- If still failing, simplify your strategy
- Use
infoto see what requirements were gathered - Check for typos in ticker/dates
- Use
infocommand to see current phase - Use
debugcommand to see conversation history - Check
reports/folder for any partial outputs - Restart with
exitand try again
Alternative LLM Providers
OpenAI:
llm keys set openai
llm models default gpt-4o-miniAnthropic:
llm keys set anthropic
llm models default claude-3-5-sonnet-20241022Contributions welcome! Areas of interest:
- Multi-asset portfolio backtesting
- Additional technical indicators
- Parameter optimization
- Risk management strategies
- Interactive visualizations
See issues or open a PR!
GPL-3.0 License. See LICENSE.
This is copyleft software - any derivative works must also be open source under GPL-3.0.
Project Structure
src/nlbt/
├── cli.py # Interactive CLI with rich formatting
├── reflection.py # 3-phase reflection engine
├── llm.py # LLM wrapper using `llm` CLI
└── sandbox.py # Safe code execution
reports/ # Generated backtest reports
├── <TICKER>_<PERIOD>_<TIMESTAMP>/
│ ├── report.md # User: Professional report
│ ├── report.pdf # User: PDF version
│ ├── strategy.py # Developer: Executable code
│ ├── debug.log # Developer: Execution trace
│ └── agent.log # Agent: Full LLM context
└── EXAMPLE_*/ # Sample outputs
tests/ # Unit and integration tests
Architecture & Design Patterns
This project implements several Agentic Design Patterns:
- Reflection Pattern: 3-phase autonomous workflow with LLM controlling transitions
- Producer-Critic Pattern: Separate models for generation and evaluation (avoids confirmation bias)
- Planning Pattern: Phase 2 plans before coding; Phase 3 plans before writing
- Tool Use Pattern: Sandbox execution, data fetching, indicator calculations
- Prompt Chaining: Phase transitions chain prompts with context
- Error Recovery: Auto-retry loop (max 3 attempts) with error feedback
- Checkpoint Pattern: Three-tier output (user/developer/agent) for reproducibility
See cursor_chats/Agentic_Design_Patterns_Complete.md for detailed documentation.