A multiagent learning model in General Game Playing that explores both promising and unpromising paths to succeed in any N-player game.
Wise Explorer takes a unique approach to Monte Carlo Tree Search (MCTS) by deliberately exploring not just the most promising branches, but also the least promising ones. This dual-exploration strategy enables pattern recognition without heuristics, training data, or game-specific knowledge.
Key Features:
- Zero prior knowledge required — no heuristics, no training data
- Universal compatibility — works with any N-player game
- Dual-phase learning — systematically finds bad moves and reinforces good ones
- Statistical clustering — borrows knowledge from similar positions
- Bayesian confidence scoring — conservative estimates that improve with data
Read the research paper for full technical details, or check out https://www.mathewe.com for more about me.
This project is a re-imagining and re-implementation of my Honors thesis. Many strides have been made in efficiency and applicability that I wasn't able to achieve in 2019, and new concepts such as "anchors" and probability distribution sampling are the result of my independent learning and research.
git clone https://github.com/MatheweB/WiseExplorer
cd WiseExplorer
pip install -e .For development (includes pytest):
pip install -e ".[dev]"# Play Tic-Tac-Toe against the AI (default)
wise-explorer
# Play Mini Chess with more training
wise-explorer --game minichess --epochs 200
# Watch AI play itself
wise-explorer --self-play
# Use existing knowledge without training
wise-explorer --no-training
# See all options
wise-explorer --help| Flag | Short | Description |
|---|---|---|
--game |
-g |
Game to play: tic_tac_toe, minichess |
--epochs |
-e |
Training epochs per move (default: 100) |
--turn-depth |
-t |
Max turns per simulation (default: 40) |
--workers |
-w |
Parallel worker processes (default: CPU count - 1) |
--no-training |
Play using existing memory only | |
--ai-only |
AI plays both sides |
from wise_explorer.api import start_simulations
from wise_explorer.memory import GameMemory
from wise_explorer.agent.agent import Agent
from wise_explorer.games import TicTacToe
from wise_explorer.utils.factory import create_agent_swarms
# Initialize game
game = TicTacToe()
# Create agent swarms (one per player)
players = [1, 2]
swarms = create_agent_swarms(players, agents_per_player=20)
# Create memory database
memory = GameMemory.for_game(game)
# Train and play
start_simulations(
agent_swarms=swarms,
game=game,
turn_depth=20,
simulations=200,
memory=memory,
)Unlike traditional approaches that evaluate board positions, Wise Explorer records transitions:
state_before → state_after → outcome
The outcome (win/tie/loss) is always stored from the perspective of the player who made the move, eliminating the need to track player identities across different game states.
Positions with similar win rates are automatically grouped into "anchors." When encountering a position with limited data, the system intelligently borrows statistics from similar, well-explored positions.
Position A: 3 games, 67% wins ─┐
Position B: 5 games, 60% wins ─┼─► Anchor: 18 games, 61% wins
Position C: 10 games, 60% wins ─┘
This clustering accelerates learning by sharing knowledge across related game states.
Move quality is computed using a lower confidence bound approach:
- Few samples → high uncertainty → conservative score near 0.5
- Many samples → high confidence → score reflects true win rate
- Pessimistic default → assumes worst case within confidence bounds
This prevents the algorithm from overcommitting to moves based on lucky early results.
Training alternates between two complementary strategies:
| Phase | Agent Behavior | Purpose |
|---|---|---|
| Prune | Intentionally play worst moves | Identify and confirm losing patterns |
| Exploit | Always play best moves | Reinforce winning strategies |
By systematically exploring both extremes, Wise Explorer builds a comprehensive understanding of the game space.
src/wise_explorer/
├── cli.py # Command-line interface
├── api.py # Public API (start_simulations)
│
├── agent/
│ └── __init__.py # Agent class and State enum
│
├── core/
│ ├── types.py # Stats, scoring constants
│ ├── hashing.py # Board state hashing
│ └── bayes.py # Bayes factor clustering
│
├── games/
│ ├── game_base.py # Abstract game interface
│ ├── game_state.py # GameState container
│ ├── game_rules.py # Board utilities
│ ├── tic_tac_toe.py # Tic-Tac-Toe implementation
│ └── minichess.py # Mini Chess implementation
│
├── memory/
│ ├── schema.py # Database schema
│ ├── anchor_manager.py # Anchor logic class
│ └── game_memory.py # GameMemory class
│
├── selection/
│ ├── training.py # Probabilistic selection (exploration)
│ └── inference.py # Deterministic selection (exploitation)
│
├── simulation/
│ ├── jobs.py # Job data structures
│ ├── worker.py # Parallel worker logic
│ ├── runner.py # SimulationRunner
│ └── training.py # Training orchestration
│
├── utils/
│ ├── config.py # Game registry and Config class
│ └── factory.py # Factory functions
│
├── data/memory/ # SQLite databases (auto-created)
│
└── debug/
└── viz.py # Terminal visualization
tests/ # Test suite (mirrors src structure)
The knowledge database that records transitions, manages anchors, and scores moves.
from wise_explorer.memory import GameMemory
# Initialize memory for a game
memory = GameMemory.for_game(game)
# Custom database path
memory = GameMemory("path/to/my_game.db")
# Read-only mode (for parallel workers)
memory = GameMemory("path/to/my_game.db", read_only=True)
# Query statistics
info = memory.get_info()
print(f"Transitions: {info['transitions']}, Anchors: {info['anchors']}")Agents operate in one of two modes:
- Prune mode (
is_prune=True): Deliberately makes poor moves to map out losing strategies - Exploit mode (
is_prune=False): Selects the highest-scoring moves
The training module implements round-robin pruning, ensuring each player systematically explores bad moves during training.
Choose how the algorithm treats game states:
# Non-Markov (default): Each transition is distinct
memory = GameMemory.for_game(game, markov=False)
# Markov: Only the resulting position matters
memory = GameMemory.for_game(game, markov=True)Markov mode treats positions as equivalent regardless of how they were reached. Non-Markov mode considers the path taken, useful for games where move history affects optimal play.
Create a class that implements the GameBase interface:
from wise_explorer.games import GameBase, GameState
from wise_explorer.agent.agent import State
import numpy as np
class MyGame(GameBase):
def game_id(self) -> str:
return "my_game"
def num_players(self) -> int:
return 2
def get_state(self) -> GameState:
"""Return current board state and active player."""
pass
def set_state(self, state: GameState) -> None:
"""Set the board state and active player."""
pass
def valid_moves(self) -> np.ndarray:
"""Return array of all legal moves."""
pass
def apply_move(self, move: np.ndarray) -> None:
"""Execute move and update game state."""
pass
def is_over(self) -> bool:
"""Check if the game has ended."""
pass
def get_result(self, player: int) -> State:
"""Return WIN/LOSS/TIE/NEUTRAL for the player."""
pass
def current_player(self) -> int:
"""Return the active player's ID."""
pass
def deep_clone(self) -> "MyGame":
"""Create an independent copy."""
pass
def clone(self) -> "MyGame":
"""Shallow copy."""
pass
def state_string(self) -> str:
"""Pretty-print for debugging."""
passThen register it in utils/config.py:
from wise_explorer.games.my_game import MyGame
GAMES = {
"tic_tac_toe": TicTacToe,
"minichess": MiniChess,
"my_game": MyGame, # Add your game
}
INITIAL_STATES = {
# ... add initial state for your game
}See games/tic_tac_toe.py and games/minichess.py for complete examples.
pytest # Run all tests
pytest tests/core/ # Run core module tests
pytest -v # Verbose outputTraining is cumulative, so every time you run wise-explorer, the model learns more. It may take a few 1000 epochs for reliably good gameplay to emerge, patience is key!
Feel free to play around with the weights at the top of core/types.py to see different behaviors while learning.
If you make large changes to the weights, or if you want to start over with training, delete the database files in data/memory.
If you run into a database locked issue, just delete the .db-shm and .db-wal files.
- Training throughput: 1,000–5,000 games/second (varies by game complexity)
- Move selection latency: <1ms for cached positions
- Storage efficiency: ~1KB per 100 unique transitions
Performance scales well with multiprocessing for independent game simulations.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
If you use Wise Explorer in your research, please cite:
@thesis{wise_explorer,
title={General Game Playing as a Bandit-Arms Problem: A Multiagent Monte-Carlo Solution Exploiting Nash Equilibria},
author={Mathewe Banda},
year={2019},
school={Oberlin College},
url={https://digitalcommons.oberlin.edu/honors/116/}
}Contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
For bug reports and feature requests, please open an issue.
Built with curiosity and a willingness to take the path less traveled by