The wags toolkit is based on state-of-the-art research into how multi-turn agents usually fail, and makes it straightforward to implement advanced countermeasures. While Model Context Protocol (MCP) offers a standardized way for AI models to interact with external tools and data sources, we still don't fully understand what makes a good MCP server. wags makes it easy to deploy the latest research on context engineering and several new MCP features improve user and agent experience without rewriting your existing MCP servers.
⚠️ Warning: wags is based on ongoing research and is under active development. Features and APIs may change. Some experimental MCP features are only supported in our fork of fast-agent included with wags.
- Python 3.13.5 or higher
uvpackage manager (recommended) orpip- Basic understanding of MCP (Model Context Protocol)
- An existing MCP server to work with
# Clone the repository
git clone https://github.com/chughtapan/wags.git
cd wags
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Install with dev dependencies (for testing and linting)
uv pip install -e ".[dev]"wags versionYou should see:
WAGS version 0.1.0
FastMCP version x.x.x
# Connect to all configured servers
wags run
# Connect to specific servers
wags run --servers githubSee the Quick Start Guide for details.
To wrap your own MCP server with wags middleware, see the Onboarding Guide for step-by-step instructions.
Example: Check out servers/github/ for a complete implementation.
src/
└── wags/ # WAGS middleware framework
├── cli/ # CLI commands using cyclopts
│ └── main.py # wags CLI entry point
├── middleware/ # Middleware implementations
│ ├── base.py # Base middleware abstract class
│ ├── elicitation.py # Parameter elicitation middleware
│ ├── roots.py # Access control middleware
│ └── todo.py # Task tracking server
├── templates/ # Jinja2 templates for code generation
│ ├── handlers.py.j2 # Handlers class template
│ └── main.py.j2 # Main file template
├── utils/ # Utility modules
│ ├── config.py # Configuration management
│ ├── handlers_generator.py # Generate handler stubs from MCP
│ └── server.py # Server discovery and running
└── proxy.py # Proxy server for middleware chain
Enable automatic task tracking for LLM agents with built-in TodoWrite and TodoRead tools:
from wags.proxy import create_proxy
proxy = create_proxy(server, enable_todos=True)This provides LLMs with tools to break down complex tasks and track progress. See the Todo Integration Guide for details.
For detailed middleware documentation, see the full documentation.
Visit https://chughtapan.github.io/wags/ for the full documentation.
# Build documentation
mkdocs build
# Serve documentation locally
mkdocs serve# Run all unit tests (excludes benchmarks by default)
.venv/bin/pytest tests/
# Run unit tests with coverage
.venv/bin/pytest tests/unit/ -v
# Run integration tests
.venv/bin/pytest tests/integration/ -v# Run linter
.venv/bin/ruff check src/ tests/ servers/
# Fix linting issues
.venv/bin/ruff check src/ tests/ servers/ --fix
# Format code
.venv/bin/ruff format src/ tests/ servers/
# Run type checking
.venv/bin/mypy src/ servers/ tests/
# Install pre-commit hooks
pre-commit install
# Run pre-commit hooks manually
pre-commit run --all-filesWAGS includes evaluation support for:
- BFCL: Berkeley Function Call Leaderboard
- AppWorld: Realistic task evaluation across 9 day-to-day apps
First, install the evaluation dependencies:
# 1. Initialize the data submodules
git submodule update --init --recursive
# 2. Install evaluation dependencies
UV_GIT_LFS=1 uv pip install -e ".[dev,evals]"# Install evaluation dependencies
UV_GIT_LFS=1 uv pip install -e ".[dev,evals]"
# Initialize AppWorld environment
appworld install
# Download benchmark data
appworld download dataBFCL:
# Run all BFCL tests
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py
# Run specific test
.venv/bin/pytest 'tests/benchmarks/bfcl/test_bfcl.py::test_bfcl[multi_turn_base_121]'
# Run with specific model
.venv/bin/pytest tests/benchmarks/bfcl/test_bfcl.py --model gpt-4oAppWorld:
# Run all train tasks
.venv/bin/pytest tests/benchmarks/appworld/test_appworld.py --dataset train --model gpt-4o
# Run specific task
.venv/bin/pytest 'tests/benchmarks/appworld/test_appworld.py::test_appworld[train_001]'For detailed information, see:
- Evaluation guide: docs/evals.md
- Test organization and patterns: tests/README.md
Apache 2.0
