plgm is a high-performance tool written in Go, designed to effortlessly generate data and simulate heavy workloads for both sharded and non-sharded MongoDB clusters.
It simulates real-world usage patterns by generating random data using robust BSON data types and executing standard CRUD operations (Find, Insert, Update, Delete) based on configurable ratios.
This tool is a complete refactor of the previous Python version, offering:
- Single Binary: No complex dependencies or Python environment setup.
- High Concurrency: Utilizes Go goroutines ("Active Workers") to generate massive load with minimal client-side resource usage.
- Configuration as Code: Fully configurable via a simple
config.yamlfile or Environment Variables. - Extensive Data Support: Supports all standard MongoDB BSON data types (ObjectId, Decimal128, Date, Binary, etc.) and realistic data generation via
gofakeit(supporting complex nested objects and arrays). - True Parallelism: Unlike the previous Python version, this tool automatically detects and utilizes all available logical CPUs (
GOMAXPROCS) by default to maximize hardware efficiency.
Option A: Download Release (Recommended)
Navigate to the [Releases] page and download the .tar.gz file matching your operating system.
- Download and Extract:
# Example for Linux
tar -xzvf plgm-linux-amd64.tar.gz
# Example for Mac (Apple Silicon)
tar -xzvf plgm-darwin-arm64.tar.gz- Run:
# The extracted binary will have the OS suffix
./plgm-linux-amd64 --versionOption B: Build from Source (Requires Go 1.25+)
This project includes a Makefile to simplify building and packaging.
git clone <repository-url>
cd plgm
go mod tidy
# Build a binary for your CURRENT machine only (no .tar.gz)
make build-local
# Run it
./bin/plgm --helpCross-Compilation (Build for different OS)
If you are preparing binaries for other users (or other servers), use the main build command. This will compile binaries for Linux and Mac and automatically package them into .tar.gz files in the bin/ folder.
# Generate all release packages
make build
# Output:
# bin/plgm-linux-amd64.tar.gz
# bin/plgm-darwin-amd64.tar.gz
# bin/plgm-darwin-arm64.tar.gzTo view the full usage guide, including available flags and environment variables, run the help command:
plgm: Percona Load Generator for MongoDB Clusters
Usage: bin/plgm [flags] [config_file]
Examples:
bin/plgm # Run with default 'config.yaml'
bin/plgm my_test.yaml # Run with specific config file
bin/plgm --help # Show this help message
Flags:
-config string
Path to the configuration file (default "config.yaml")
-version
Print version information and exit
Environment Variables (Overrides):
[Connection]
PERCONALOAD_URI Connection URI
PERCONALOAD_USERNAME Database User
PERCONALOAD_PASSWORD Database Password (Recommended: Use Prompt)
PERCONALOAD_DIRECT_CONNECTION Force direct connection (true/false)
PERCONALOAD_REPLICA_SET Replica Set name
PERCONALOAD_READ_PREFERENCE nearest
[Workload Core]
PERCONALOAD_DEFAULT_WORKLOAD Use built-in workload (true/false)
PERCONALOAD_COLLECTIONS_PATH Path to collection JSON
PERCONALOAD_QUERIES_PATH Path to query JSON
PERCONALOAD_DURATION Test duration (e.g. 60s, 5m)
PERCONALOAD_CONCURRENCY Number of active workers
PERCONALOAD_DOCUMENTS_COUNT Initial seed document count
PERCONALOAD_DROP_COLLECTIONS Drop collections on start (true/false)
PERCONALOAD_SKIP_SEED Do not seed initial data on start (true/false)
PERCONALOAD_DEBUG_MODE Enable verbose logic logs (true/false)
[Operation Ratios] (Must sum to ~100)
PERCONALOAD_FIND_PERCENT % of ops that are FIND
PERCONALOAD_UPDATE_PERCENT % of ops that are UPDATE
PERCONALOAD_INSERT_PERCENT % of ops that are INSERT
PERCONALOAD_DELETE_PERCENT % of ops that are DELETE
PERCONALOAD_AGGREGATE_PERCENT % of ops that are AGGREGATE
[Performance Optimization]
PERCONALOAD_FIND_BATCH_SIZE Docs returned per cursor batch
PERCONALOAD_FIND_LIMIT Max docs per Find query
PERCONALOAD_INSERT_CACHE_SIZE Generator buffer size
PERCONALOAD_OP_TIMEOUT_MS Soft timeout per DB op (ms)
PERCONALOAD_RETRY_ATTEMPTS Retry attempts for failures
PERCONALOAD_RETRY_BACKOFF_MS Wait time between retries (ms)
PERCONALOAD_STATUS_REFRESH_RATE_SEC Status report interval (sec)
GOMAXPROCS Go Runtime CPU limitplgm comes with a built-in default workload useful for immediate testing and get you started right away.
# Edit config.yaml to set your URI, then run:
./bin/plgmNote about default workload: plgm comes pre-configured with a default collection and default queries. If you do not provide any parameters and leave the configuration setting default_workload: true, this default workload will be used.
If you wish to use a different default workload, you can replace these two files with your own default.json files in the same paths. This allows you to define a different collection and set of queries as the default workload.
Note on config file usage: If you do not specify the config file name (above example), plgm will use the config.yaml by default. You can create separate configuration files if you wish and then pass it as an argument:
./bin/plgm /path/to/some/custom_config.yamlYou will find additional workloads that you can use as references to benchmark your environment in cases where you prefer not to provide your own collection definitions and queries. However, if your goal is to test your application accurately, we strongly recommend creating collection definitions and queries that match those used by your application.
The additional collection and query definitions can be found here:
Prefer running in a container? We have a dedicated guide for building Docker images and running performance jobs directly inside Kubernetes (recommended for accurate network latency testing).
View the Docker & Kubernetes Guide
plgm is configured primarily through its config.yaml file. This makes it easier to save and version-control your test scenarios.
You can override any setting in config.yaml using environment variables. This is useful for CI/CD pipelines, Kubernetes deployments, or quick runtime adjustments without editing the file. These are all the available ENV vars you can configure:
| Environment Variable | Description | Example |
|---|---|---|
| Connection | ||
PERCONALOAD_URI |
Target MongoDB connection URI | mongodb://user:pass@host:27017 |
PERCONALOAD_DIRECT_CONNECTION |
Force direct connection (bypass topology discovery) | true |
PERCONALOAD_REPLICA_SET |
Replica Set name (required for sharded clusters/RS) | rs0 |
PERCONALOAD_READ_PREFERENCE |
By default, an application directs its read operations to the primary member in a replica set. You can specify a read preference to send read operations to secondaries. | nearest |
PERCONALOAD_USERNAME |
Database User | admin |
PERCONALOAD_PASSWORD |
Database Password (if not set, plgm will prompt) | password123 |
| Workload Control | ||
PERCONALOAD_CONCURRENCY |
Number of active worker goroutines | 50 |
PERCONALOAD_DURATION |
Test duration (Go duration string) | 5m, 60s |
PERCONALOAD_DEFAULT_WORKLOAD |
Use built-in "Flights" workload (true/false) |
false |
PERCONALOAD_COLLECTIONS_PATH |
Path to custom collection JSON files | ./schemas |
PERCONALOAD_QUERIES_PATH |
Path to custom query JSON files | ./queries |
PERCONALOAD_DOCUMENTS_COUNT |
Number of documents to seed initially | 10000 |
PERCONALOAD_DROP_COLLECTIONS |
Drop collections before starting (true/false) |
true |
PERCONALOAD_SKIP_SEED |
Do not seed initial data on start (true/false) |
true |
PERCONALOAD_DEBUG_MODE |
Enable verbose debug logging (true/false) |
false |
| Operation Ratios | (Must sum to ~100) | |
PERCONALOAD_FIND_PERCENT |
Percentage of Find operations | 55 |
PERCONALOAD_INSERT_PERCENT |
Percentage of Insert operations | 20 |
PERCONALOAD_UPDATE_PERCENT |
Percentage of Update operations | 10 |
PERCONALOAD_DELETE_PERCENT |
Percentage of Delete operations | 10 |
PERCONALOAD_AGGREGATE_PERCENT |
Percentage of Aggregate operations | 5 |
| Performance Optimization | ||
PERCONALOAD_FIND_BATCH_SIZE |
Documents returned per cursor batch | 100 |
PERCONALOAD_FIND_LIMIT |
Hard limit on documents per Find query | 10 |
PERCONALOAD_INSERT_CACHE_SIZE |
Size of the document generation buffer | 1000 |
PERCONALOAD_OP_TIMEOUT_MS |
Soft timeout for individual DB operations (ms) | 500 |
PERCONALOAD_RETRY_ATTEMPTS |
Number of retries for transient errors | 3 |
PERCONALOAD_RETRY_BACKOFF_MS |
Wait time between retries (ms) | 10 |
PERCONALOAD_STATUS_REFRESH_RATE_SEC |
How often to print stats to console (sec) | 5 |
Example:
PERCONALOAD_CONCURRENCY=50 PERCONALOAD_DURATION=5m ./bin/plgmWhen executed, plgm performs the following steps:
- Initialization: Connects to the database and loads collection/query definitions.
- Setup:
- Creates databases and collections defined in your JSON files.
- Creates indexes.
- (Optional) Seeds initial data with the number of documents defined by
documents_countin the config.
- Workload Execution:
- Spawns the configured number of Active Workers.
- Continuously generates and executes queries (Find, Insert, Update, Delete, Aggregate) based on your configured ratios.
- Generates realistic BSON data for Inserts and Updates (supports recursion and complex schemas).
- Reporting:
- Outputs a real-time status report every N seconds (configurable).
- Prints a detailed summary table at the end of the run.
To run your own workload against your own schema:
-
Define Collection Schema: Create a JSON file (e.g.,
my_collection.json) defining your schema.[ { "database": "ecommerce", "collection": "orders", "fields": { "_id": { "type": "objectid" }, "customer_name": { "type": "string", "provider": "first_name" }, "total": { "type": "double" }, "created_at": { "type": "date" } } } ] -
Define Query Patterns: Create a JSON file (e.g.,
my_queries.json) defining the operations to run.[ { "database": "ecommerce", "collection": "orders", "operation": "find", "filter": { "customer_name": "<string>" }, "limit": 10 } ] -
Run:
export PERCONALOAD_COLLECTIONS_PATH=./my_collection.json export PERCONALOAD_QUERIES_PATH=./my_queries.json ./bin/plgm
- Primitives:
int,long,double,decimal128,bool,string. - Time:
date,timestamp. - Binary/Logic:
binary,uuid,objectid,regex,javascript. - Complex:
object,array. - Providers: Supports ANY gofakeit provider via reflection. Example:
beer_name,car_maker,bitcoin_address,credit_card,city,ssn, etc..
plgm is designed to utilize maximum system resources by default, but it can be fine-tuned to fit specific hardware constraints or testing scenarios.
By default, plgm automatically detects and schedules work across all available logical CPUs. You generally do not need to configure this.
However, if you are running in a constrained environment (e.g., a shared CI runner or a container with strict CPU limits) or if you want to throttle the generator's CPU usage, you can override this via the standard Go environment variable:
# Limit plgm to use only 2 CPU cores
export GOMAXPROCS=2
./plgmYou can fine-tune plgm internal behavior by adjusting the parameters in config.yaml.
concurrency: Controls the number of "Active Workers" continuously executing operations against the database.- Tip: Increase this to generate higher load. If set too high on a weak client, you may see increased client-side latency.
- Default:
4
These settings control the MongoDB driver's connection pool. Proper sizing is critical to prevent the application from waiting for available connections.
max_pool_size: The maximum number of connections allowed in the pool.- Tip: A good rule of thumb is to set this slightly higher than your
concurrencysetting so that every worker is guaranteed a connection without blocking. - Default:
1000
- Tip: A good rule of thumb is to set this slightly higher than your
min_pool_size: The minimum number of connections to keep open.- Tip: Setting this higher helps avoid the "cold start" penalty of establishing new connections during the initial ramp-up.
- Default:
20
max_idle_time: How long a connection can remain unused before being closed (in minutes).- Tip: Keep this high (e.g.,
30) to avoid "reconnect churn" during brief pauses in workload.
- Tip: Keep this high (e.g.,
These settings affect the efficiency of individual database operations and memory usage.
find_batch_size: The number of documents returned per batch in a cursor.- Tip: Higher values reduce network round-trips but increase memory usage per worker.
- Default:
10
find_limit: The hard limit on documents returned forfindoperations.- Default:
5
- Default:
insert_cache_size: The buffer size for the document generator channel.- Tip: This decouples document generation from database insertion. A larger buffer ensures workers rarely wait for data generation logic.
- Default:
1000
Control how plgm reacts to network lag or database pressure.
op_timeout_ms: A hard timeout for individual database operations.- Tip: Lowering this allows plgm to fail fast and retry rather than hanging on stalled requests.
- Default:
500(0.5 seconds)
retry_attempts&retry_backoff_ms: Logic for handling transient failures.- Tip: For stress testing, you might want to set
retry_attempts: 0to see raw failure rates immediately. - Default:
2attempts with5msbackoff.
- Tip: For stress testing, you might want to set
In the config.yaml, the custom_params section allows you to pass arbitrary options directly to the MongoDB driver's connection string. These are critical for tuning network throughput and security.
custom_params:
compressors: "zlib,snappy"
ssl: false| Parameter | Example Value | Impact on Performance |
|---|---|---|
compressors |
"snappy,zlib" |
High Impact. Enables network compression. • snappy: Low CPU overhead, moderate compression. Good for high-throughput, low-latency. • zlib: Higher CPU overhead, high compression. Good for limited bandwidth. • Empty: No compression (saves CPU, uses max bandwidth). |
ssl |
false |
Low/Medium Impact. Disabling SSL (false) saves the CPU overhead of TLS handshakes and encryption, useful for local testing or secured private networks. |
readPreference |
"secondary" |
Medium Impact. (Optional) Can be added to offload read operations to replica set secondaries, keeping the primary free for writes. |
