Use LLMs for document ranking.
Got a bunch of data? Want to use an LLM to find the most "interesting" stuff? If you simply paste your data into an LLM chat session, you'll run into problems:
- Nondeterminism: Doesn't always respond with the same result
- Limited context: Can't pass in all the data at once, need to break it up
- Output constraints: Sometimes doesn't return all the data you asked it to review
- Scoring subjectivity: Struggles to assign a consistent numeric score to an individual item
siftrank is an implementation of the SiftRank document ranking algorithm that uses LLMs to efficiently find the items in any dataset that are most relevant to a given prompt:
- Stochastic: Randomly samples the dataset into small batches.
- Inflective: Looks for a natural inflection point in the scores that distinguishes particularly relevant items from the rest.
- Fixed: Caps the maximum number of LLM calls so the computational complexity remains linear in the worst case.
- Trial: Repeatedly compares batched items until the relevance scores stabilize.
Use any LLM to rank anything. No fine-tuning. No domain-specific models. Just an off-the-shelf model and your ranking prompt. Typically runs in seconds and costs pennies.
siftrank is provider-agnostic and works with multiple LLM providers:
- OpenAI - GPT-4, GPT-4o, GPT-4o-mini (via
OPENAI_API_KEY) - Anthropic - Claude Opus, Claude Sonnet, Claude Haiku (via
ANTHROPIC_API_KEY) - OpenRouter - Access 200+ models from multiple providers (via
OPENROUTER_API_KEY) - Ollama - Local or cloud-hosted models like Llama, Mistral, Qwen (via
OLLAMA_API_KEYfor cloud, no key for local) - Google - Gemini Pro, Gemini Flash (via
GOOGLE_API_KEY)
Select your provider with --provider <name> or use the default (OpenAI). Set the appropriate API key environment variable for your chosen provider.
| Provider | Best For | Strengths | Considerations |
|---|---|---|---|
| OpenAI | General use, batch mode | Fast, cost-effective (gpt-4o-mini), batch API (50% savings), widest model range | Requires API key, cloud-only |
| Anthropic | Complex analysis, nuance | Strong reasoning (Claude Sonnet/Opus), careful instruction following | Higher cost for top-tier models |
| OpenRouter | Model experimentation | Access 200+ models, single API key, easy model comparison | Adds routing layer, pricing varies by model |
| Ollama | Privacy, local control | Free local inference, no data leaves your machine, cloud option available | Requires local GPU for good performance, slower than cloud APIs |
| Google ecosystem | Gemini models, competitive pricing | Smaller model selection for ranking tasks |
Quick decision guide:
- Just getting started? Use OpenAI with
gpt-4o-mini(default). Cheapest cloud option with great results. - Need privacy? Use Ollama with a local model. No data leaves your machine.
- Large dataset (1000+ docs)? Use
siftrank batch submitwith OpenAI for 50% cost savings. - Want the best quality? Use Anthropic with
claude-sonnet-4-20250514or OpenAI withgpt-4o. - Comparing models? Use OpenRouter with
--compareto test multiple models through one API key.
go install github.com/meganerd/siftrank/cmd/siftrank@latest
Set the API key for your chosen provider:
# OpenAI (default provider)
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenRouter
export OPENROUTER_API_KEY="sk-or-..."
# Google
export GOOGLE_API_KEY="..."
# Ollama (local — no API key needed)
# Ensure Ollama server is running: ollama serve
# Ollama (cloud-hosted — requires API key)
export OLLAMA_API_KEY="..."siftrank -h
Options:
-f, --file string input file (required)
-m, --model string model name (default "gpt-4o-mini")
-o, --output string JSON output file
--pattern string glob pattern for filtering files in directory (default "*")
-p, --prompt string initial prompt (prefix with @ to use a file)
--provider string LLM provider: openai, anthropic, openrouter, ollama, google (default "openai")
-r, --relevance post-process each item by providing relevance justification (skips round 1)
--compare string compare multiple models (format: "provider:model,provider:model")
--report-cost print estimated cost summary to stderr after ranking
Visualization:
--no-minimap disable minimap panel in watch mode
--watch enable live terminal visualization (logs suppressed unless --log is specified)
Debug:
-d, --debug enable debug logging
--dry-run log API calls without making them
--log string write logs to file instead of stderr
--trace string trace file path for streaming trial execution state (JSON Lines format)
Advanced:
-u, --base-url string custom API base URL (for OpenAI-compatible APIs like vLLM)
-b, --batch-size int number of items per batch (default 10)
-c, --concurrency int max concurrent LLM calls across all trials (default 50)
-e, --effort string reasoning effort level: none, minimal, low, medium, high
--elbow-method string elbow detection method: curvature (default), perpendicular (default "curvature")
--elbow-tolerance float elbow position tolerance (0.05 = 5%) (default 0.05)
--encoding string tokenizer encoding (default "o200k_base")
--json force JSON parsing regardless of file extension
--max-trials int maximum number of ranking trials (default 50)
--min-trials int minimum trials before checking convergence (default 5)
--no-converge disable early stopping based on convergence
--ratio float refinement ratio (0.0-1.0, e.g. 0.5 = top 50%) (default 0.5)
--stable-trials int stable trials required for convergence (default 5)
--template string template for each object (prefix with @ to use a file) (default "{{.Data}}")
--tokens int max tokens per batch (includes prompt + documents) (default 128000)
Flags:
-h, --help help for siftrank
Compares 100 sentences in 7 seconds using the default provider (OpenAI):
siftrank \
-f testdata/sentences.txt \
-p 'Rank each of these items according to their relevancy to the concept of "time".' |
jq -r '.[:10] | map(.value)[]' |
nl
1 The train arrived exactly on time.
2 The old clock chimed twelve times.
3 The clock ticked steadily on the wall.
4 The bell rang, signaling the end of class.
5 The rooster crowed at the break of dawn.
6 She climbed to the top of the hill to watch the sunset.
7 He watched as the leaves fell one by one.
8 The stars twinkled brightly in the clear night sky.
9 He spotted a shooting star while stargazing.
10 She opened the curtains to let in the morning light.siftrank outputs a JSON array of ranked documents, sorted by score (lower = better):
[
{
"key": "eQJpm-Qs",
"value": "The train arrived exactly on time.",
"score": 1.5,
"rank": 1,
"input_index": 0
},
{
"key": "SyJ3d9Td",
"value": "The old clock chimed twelve times.",
"score": 2.3,
"rank": 2,
"input_index": 5
}
]| Field | Description |
|---|---|
key |
Deterministic short ID for the document |
value |
The document text (or rendered template output) |
score |
Average positional score across trials (lower = more relevant) |
rank |
Final rank position (1 = best match) |
input_index |
Original position in the input file (0-indexed) |
Common output recipes:
# Top 5 values only
siftrank -f data.txt -p 'Rank by relevance' | jq -r '.[:5] | map(.value)[]'
# Top 10 as numbered list
siftrank -f data.txt -p 'Rank by quality' | jq -r '.[:10] | map(.value)[]' | nl
# Save full results to file
siftrank -f data.txt -p 'Rank by importance' -o results.json
# Get built-in cost report
siftrank -f data.txt -p 'Rank by priority' --report-costThe --report-cost flag prints a cost summary to stderr after ranking:
--- Cost Report ---
Model: gpt-4o-mini
Input tokens: 48250
Output tokens: 9830
Estimated cost: $0.013125
-------------------
Use a different provider by specifying --provider and --model:
# Use Anthropic's Claude Sonnet
siftrank \
--provider anthropic \
--model claude-sonnet-4-20250514 \
-f testdata/sentences.txt \
-p 'Rank by relevancy to "time".'
# Use Ollama with a local model
siftrank \
--provider ollama \
--model llama3.3 \
-f testdata/sentences.txt \
-p 'Rank by relevancy to "time".'Examples demonstrating different providers and use cases.
Basic ranking with gpt-4o-mini (default):
siftrank \
-f logs/access.log \
-p 'Find suspicious requests that might indicate an attack.' \
-o suspicious_requests.jsonUsing GPT-4o for complex analysis:
siftrank \
--provider openai \
--model gpt-4o \
-f cve_descriptions.txt \
-p 'Rank vulnerabilities by exploitability and impact.'With reasoning effort (o1/o3 models):
siftrank \
--provider openai \
--model o1-mini \
--effort medium \
-f security_findings.json \
-p 'Prioritize findings by severity and likelihood of exploitation.'Claude Sonnet for balanced performance:
siftrank \
--provider anthropic \
--model claude-sonnet-4-20250514 \
-f research_papers.json \
-p 'Rank papers by relevance to LLM security.' \
--trace anthropic_trace.jsonlClaude Haiku for fast, cost-effective ranking:
siftrank \
--provider anthropic \
--model claude-haiku-4-20250514 \
-f user_feedback.txt \
-p 'Identify feedback indicating bugs or usability issues.' \
--watchClaude Opus for highest quality analysis:
siftrank \
--provider anthropic \
--model claude-opus-4-20250514 \
-f threat_intelligence.json \
-p 'Rank threats by sophistication and potential impact to our infrastructure.'Access multiple providers through one API:
# Set OpenRouter API key
export OPENROUTER_API_KEY="sk-or-..."
# Use any model from OpenRouter's catalog
siftrank \
--provider openrouter \
--model anthropic/claude-sonnet-4 \
-f documents.txt \
-p 'Find documents related to incident response.'Compare frontier models:
siftrank \
--provider openrouter \
--model google/gemini-2.0-flash-exp \
-f code_review.json \
-p 'Identify security vulnerabilities in this code.' \
--compare "openrouter:anthropic/claude-sonnet-4,openrouter:openai/gpt-4o"siftrank supports both local and cloud-hosted Ollama instances. Local instances require no API key; cloud instances authenticate via OLLAMA_API_KEY using Bearer token auth.
Authentication precedence: --api-key flag > OLLAMA_API_KEY env var > config api_keys.ollama > no auth (local).
Run completely local with Llama:
# Ensure Ollama is running: ollama serve
# Pull model if needed: ollama pull llama3.3
siftrank \
--provider ollama \
--model llama3.3 \
--base-url http://localhost:11434 \
-f sensitive_data.txt \
-p 'Identify PII that needs redaction.' \
-o redaction_candidates.jsonUse local model with custom Ollama server:
siftrank \
--provider ollama \
--model qwen2.5-coder:7b \
--base-url http://gpu-server:11434 \
-f code_snippets.txt \
-p 'Rank code by complexity and maintainability.'Cloud-hosted Ollama instance:
# Set API key for cloud-hosted Ollama
export OLLAMA_API_KEY="your-cloud-api-key"
siftrank \
--provider ollama \
--model llama3.3 \
--base-url https://ollama.example.com \
-f documents.txt \
-p 'Rank by relevance to security compliance.'Using a config file for cloud Ollama:
# ~/.config/siftrank/config.yaml
provider: ollama
model: llama3.3
base_url: https://ollama.example.com
api_keys:
ollama: your-cloud-api-key# With config file, no flags needed:
siftrank -f documents.txt -p 'Rank by relevance.'Local model for privacy-sensitive ranking:
siftrank \
--provider ollama \
--model mistral:7b-instruct \
--base-url http://localhost:11434 \
-f employee_reviews.txt \
-p 'Identify reviews mentioning management concerns.' \
--no-converge \
--max-trials 10Compare cost vs performance:
# Fast model vs quality model
siftrank \
-f large_dataset.json \
-p 'Rank by business value.' \
--compare "openai:gpt-4o-mini,openai:gpt-4o" \
--trace comparison_cost_quality.jsonlCompare across providers:
# OpenAI vs Anthropic vs local
siftrank \
-f documents.txt \
-p 'Find documents about security best practices.' \
--compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514,ollama:llama3.3" \
--trace multi_provider_comparison.jsonl
# Analyze results
jq -s 'group_by(.model) | map({
model: .[0].model,
calls: length,
avg_latency: (map(.latency_ms) | add / length),
total_tokens: (map(.input_tokens + .output_tokens) | add)
})' multi_provider_comparison.jsonlCompare OpenRouter models:
siftrank \
-f research_questions.txt \
-p 'Prioritize research questions by impact.' \
--compare "openrouter:anthropic/claude-sonnet-4,openrouter:google/gemini-2.0-flash-exp,openrouter:meta-llama/llama-3.3-70b-instruct" \
--trace openrouter_comparison.jsonlRecent enhancements to siftrank enable advanced workflows for large-scale ranking tasks.
Process multiple files from a directory with optional pattern filtering:
# Process all JSON files in a directory
siftrank \
-f ./data \
--pattern "*.json" \
-p 'Rank by importance' \
-o results.json
# Process log files matching a pattern
siftrank \
-f ./logs \
--pattern "error_*.log" \
-p 'Find critical errors that need immediate attention.' \
--watchFeatures:
- Non-recursive - Only processes files in the specified directory (not subdirectories)
- Glob filtering - Use patterns like
*.txt,data_*.json, orreport_[0-9]*.log - Aggregated ranking - All documents from matching files are ranked together as a single dataset
- Sorted enumeration - Files are processed in deterministic alphabetical order
Security: Directory traversal (..) is blocked. Resource limits apply (1000 files per directory, 10000 documents total).
Automatically stop ranking when results stabilize, saving time and API costs:
# Enable convergence detection (default behavior)
siftrank \
-f data.txt \
-p 'Rank by quality' \
--min-trials 5 \
--stable-trials 5How it works:
- Elbow detection - Identifies the inflection point where scores plateau
- Stability tracking - Waits for N consecutive trials with consistent elbow position
- Early exit - Stops as soon as convergence criteria are met
Configuration:
# Disable convergence for fixed trial count
siftrank -f data.txt -p 'Rank' --no-converge --max-trials 20
# Adjust convergence sensitivity
siftrank -f data.txt -p 'Rank' \
--min-trials 3 \
--stable-trials 7 \
--elbow-tolerance 0.10| Flag | Default | Description |
|---|---|---|
--min-trials |
5 | Minimum trials before checking convergence |
--stable-trials |
5 | Consecutive stable trials required |
--elbow-tolerance |
0.05 | Tolerance for elbow position stability (5%) |
--no-converge |
false | Disable early stopping |
Typical savings: 40-60% reduction in API calls for datasets with clear ranking signal.
Choose between two elbow detection algorithms:
# Curvature-based detection (default)
siftrank -f data.txt -p 'Rank' --elbow-method curvature
# Perpendicular distance detection
siftrank -f data.txt -p 'Rank' --elbow-method perpendicularMethods:
- Curvature (default) - Finds maximum curvature in the score curve. Best for smooth, exponential-like distributions.
- Perpendicular - Maximizes perpendicular distance from line connecting first and last points. Best for linear-then-flat distributions.
When to switch methods:
- Use
curvaturefor most cases - works well with typical ranking distributions - Use
perpendicularif curvature fails to detect an obvious inflection point - Compare both with
--traceand visual inspection
Monitor ranking progress in real-time with terminal-based visualization:
# Enable watch mode
siftrank -f data.txt -p 'Rank by priority' --watch
# Watch mode without minimap (larger chart)
siftrank -f data.txt -p 'Rank' --watch --no-minimapDisplay panels:
- Score chart - Real-time convergence visualization with elbow marker
- Minimap - Overview of full score distribution (disable with
--no-minimap) - Statistics - Trial count, convergence status, API call count
- Top items - Live preview of current top-ranked results
Note: Watch mode suppresses log output by default. Use --log <file> to capture logs while watching.
Generate structured explanations for each ranked item:
# Add pros/cons for each result
siftrank \
-f data.txt \
-p 'Rank security vulnerabilities by severity' \
--relevance \
-o results.jsonOutput format (with --relevance):
{
"key": "abc123",
"value": "SQL injection in login form",
"score": 0,
"rank": 1,
"justification": {
"pros": [
"Direct database access",
"Authentication bypass potential",
"High exploitability"
],
"cons": [
"Requires network access",
"May be mitigated by WAF"
]
}
}Use cases:
- Decision support - Understand why items ranked high/low
- Quality assurance - Validate LLM reasoning
- Reporting - Generate audit trails with explanations
Note: Relevance mode skips initial trial round (jumps directly to justification), so use with a reasonable --max-trials limit.
Submit large ranking jobs to the OpenAI Batch API for 50% cost savings. Batch jobs complete within 24 hours — ideal for large, cost-sensitive datasets where real-time results are not required.
Subcommands:
siftrank batch submit Submit a batch ranking job
siftrank batch status Check the status of a batch job
siftrank batch results Download and process results from a completed batch job
End-to-end workflow:
# Step 1: Submit a batch job
siftrank batch submit \
-f documents.txt \
-p 'Rank by business value' \
-m gpt-4o-mini \
-o ./output
# Output:
# Loaded 500 documents from documents.txt
# Generated 50 batch requests
# Uploaded batch file: file-abc123
# Created batch: batch_xyz789
# Mapping file: ./output/.siftrank-batch.json
#
# Check status:
# siftrank batch status batch_xyz789
#
# Get results when complete:
# siftrank batch results ./output/.siftrank-batch.json
# Step 2: Check status (repeat until "completed")
siftrank batch status batch_xyz789
# Output:
# Batch Status
# ============
# ID: batch_xyz789
# Status: completed
# Request Counts
# Total: 50
# Completed: 50
# Failed: 0
# Step 3: Download and process results
siftrank batch results ./output/.siftrank-batch.json > ranked_output.jsonSubmit flags:
| Flag | Default | Description |
|---|---|---|
-f, --file |
(required) | Input file with documents (one per line) |
-p, --prompt |
(required) | Ranking prompt (prefix with @ to use a file) |
-m, --model |
gpt-4o-mini |
OpenAI model name |
-b, --batch-size |
10 |
Documents per batch request |
-o, --output-dir |
. |
Directory for mapping file output |
How it works:
- Documents are split into batches and formatted as JSONL for the OpenAI Batch API
- A mapping file (
.siftrank-batch.json) is saved to correlate results back to input documents - Results are scored by average position across batches (same scoring as real-time mode)
Note: Batch mode only supports the OpenAI provider. Use the standard siftrank command for other providers.
Stream execution state to a file for analysis and debugging:
# Basic trace
siftrank -f data.txt -p 'Rank' --trace trace.jsonl
# Monitor in real-time
siftrank -f data.txt -p 'Rank' --trace trace.jsonl &
tail -f trace.jsonl | jqTrace file contents (JSON Lines format):
{"trial":1,"round":1,"model":"gpt-4o-mini","input_tokens":1234,"output_tokens":567,"latency_ms":850}
{"trial":1,"round":2,"model":"gpt-4o-mini","input_tokens":1156,"output_tokens":489,"latency_ms":790}
{"trial":2,"round":1,"model":"gpt-4o-mini","input_tokens":1234,"output_tokens":602,"latency_ms":820}Analysis examples:
# Calculate total cost
jq -s 'map(.input_tokens + .output_tokens) | add' trace.jsonl
# Latency percentiles
jq -s 'map(.latency_ms) | sort | .[length*0.95 | floor]' trace.jsonl
# Success rate
jq -s 'map(.success) | add / length * 100' trace.jsonlWith --compare:
siftrank \
-f data.txt \
-p 'Rank' \
--compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514" \
--trace comparison.jsonl
# Compare model performance
jq -s 'group_by(.model) | map({
model: .[0].model,
calls: length,
avg_latency: (map(.latency_ms) | add / length),
total_cost: (map(.input_tokens + .output_tokens) | add)
})' comparison.jsonlAdvanced usage
If the input file is a JSON document, it will be read as an array of objects and each object will be used for ranking.
For instance, two objects would be loaded and ranked from this document:
[
{
"path": "/foo",
"code": "bar"
},
{
"path": "/baz",
"code": "nope"
}
]It is possible to include each element from the input file in a template using the Go template syntax via the --template "template string" (or --template @file.tpl) argument.
For text input files, each line can be referenced in the template with the Data variable:
Anything you want with {{ .Data }}
For JSON input files, each object in the array can be referenced directly. For instance, elements of the previous JSON example can be referenced in the template code like so:
# {{ .path }}
{{ .code }}
Note in the following example that the resulting value key contains the actual value being presented for ranking (as described by the template), while the object key contains the entire original object from the input file for easy reference.
# Create some test JSON data.
seq 9 |
paste -d @ - - - |
parallel 'echo {} | tr @ "\n" | jo -a | jo nums=:/dev/stdin' |
jo -a |
tee input.json
[{"nums":[1,2,3]},{"nums":[4,5,6]},{"nums":[7,8,9]}]
# Use template to extract the first element of the nums array in each input object.
siftrank \
-f input.json \
-p 'Which is biggest?' \
--template '{{ index .nums 0 }}' \
--max-trials 1 |
jq -c '.[]'
{"key":"eQJpm-Qs","value":"7","object":{"nums":[7,8,9]},"score":0,"exposure":1,"rank":1}
{"key":"SyJ3d9Td","value":"4","object":{"nums":[4,5,6]},"score":2,"exposure":1,"rank":2}
{"key":"a4ayc_80","value":"1","object":{"nums":[1,2,3]},"score":3,"exposure":1,"rank":3}
siftrank tracks token consumption and performance metrics for all LLM calls, enabling cost estimation and model comparison.
Every LLM API call records:
- Input tokens (prompt tokens)
- Output tokens (completion tokens)
- Reasoning tokens (for o1/o3 models)
Token usage accumulates across all trials and is included in the trace file (see --trace flag).
Note: The
--tokensbudget applies to each batch as a whole, including the ranking prompt and document content combined. When adjusting--tokens, account for your prompt length — larger prompts leave less room for documents per batch.
Compare multiple models side-by-side to evaluate performance and cost tradeoffs:
# Compare OpenAI vs Anthropic
siftrank \
-f testdata/sentences.txt \
-p 'Rank by relevancy to "time".' \
--compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514" \
--trace comparison.jsonl
# Compare multiple OpenRouter models
siftrank \
-f testdata/sentences.txt \
-p 'Rank by relevancy to "time".' \
--compare "openrouter:anthropic/claude-sonnet-4,openrouter:openai/gpt-4o" \
--trace comparison.jsonlCollected metrics per model:
- Call count - Total number of API calls
- Success rate - Ratio of successful vs failed calls
- Latency statistics - Average, P50, P95, P99 (milliseconds)
- Total tokens - Sum of all input + output + reasoning tokens across all calls
The --trace <file> flag writes JSON Lines output with detailed execution state:
siftrank -f data.txt -p 'Rank items' --trace trace.jsonlEach line in the trace file contains:
{
"trial": 1,
"round": 2,
"model": "gpt-4o-mini",
"batch_size": 10,
"input_tokens": 1234,
"output_tokens": 567,
"reasoning_tokens": 0,
"latency_ms": 850,
"success": true,
"elbow_detected": false
}Use the trace file to:
- Monitor progress in real-time (
tail -f trace.jsonl) - Analyze token consumption patterns across trials
- Compare model performance when using
--compare - Debug convergence behavior with elbow detection data
To estimate costs from token usage:
- Extract token totals from trace file:
jq -s 'map({model, input: .input_tokens, output: .output_tokens}) | group_by(.model) | map({model: .[0].model, total_input: (map(.input) | add), total_output: (map(.output) | add)})' trace.jsonl- Apply provider pricing (as of 2026-02):
| Provider | Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| OpenAI | gpt-4o-mini | $0.15 | $0.60 |
| OpenAI | gpt-4o | $2.50 | $10.00 |
| Anthropic | claude-haiku-4 | $0.25 | $1.25 |
| Anthropic | claude-sonnet-4 | $3.00 | $15.00 |
| OpenRouter | varies | varies | varies |
| Ollama (local) | free | $0.00 | $0.00 |
| Ollama (cloud) | varies by provider | varies | varies |
Example cost calculation:
Input tokens: 50,000
Output tokens: 10,000
Model: gpt-4o-mini
Cost = (50,000 / 1,000,000) × $0.15 + (10,000 / 1,000,000) × $0.60
= $0.0075 + $0.0060
= $0.0135 (~1.4 cents)
Tip: Use
--report-costfor built-in cost reporting. It prints a cost summary (model, tokens, estimated cost in USD) to stderr after ranking completes.
This project is a fork and significant evolution of Raink, originally created by noperator at Bishop Fox. The original Raink prototype introduced the core SiftRank algorithm and demonstrated LLM-based document ranking for security research. See the original presentation, blog post, and CLI tool.
This fork (siftrank.meganerd) represents a substantial rewrite with:
- Provider-agnostic architecture - Support for OpenAI, Anthropic, OpenRouter, Ollama, and Google (upstream: OpenAI only)
- Production-grade reliability - Comprehensive error handling, resource limits, security hardening
- Advanced features - Convergence detection, directory input, watch mode visualization, trace monitoring, cost tracking
- Model comparison - Side-by-side evaluation across providers with performance metrics
- Extensive documentation - Multi-provider examples, practical use cases, cost estimation guidance
While building on the foundational algorithm from Raink, this implementation diverges significantly in architecture, capabilities, and scope. Both projects share the goal of making LLM-powered document ranking accessible and practical.
- O(N) the Money: Scaling Vulnerability Research with LLMs
- Using LLMs to solve security problems
- Hard problems that reduce to document ranking
- Commentary: Critical Thinking - Bug Bounty Podcast
- Discussion: Hacker News
- Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
- add python bindings?
- factor LLM calls out into a separate package
- account for reasoning tokens separately
Completed
- run openai batch mode
- add more examples, use cases
- allow specifying an input directory (where each file is distinct object)
- clarify when prompt included in token estimate
- report cost + token usage
- add visualization
- support reasoning effort
- add blog link
- add parameter for refinement ratio
- add
booleanrefinement ratio flag - alert if the incoming context window is super large
- automatically calculate optimal batch size?
- explore "tournament" sort vs complete exposure each time
- make sure that each randomized run is evenly split into groups so each one gets included/exposed
- parallelize openai calls for each run
- remove token limit threshold? potentially confusing/unnecessary
- save time by using shorter hash ids
- separate package and cli tool
- some batches near the end of a run (9?) are small for some reason
- support non-OpenAI models
This project is licensed under the MIT License.