GitHub - meganerd/siftrank: Use LLMs to rank anything.

Use LLMs for document ranking.

Description

Got a bunch of data? Want to use an LLM to find the most "interesting" stuff? If you simply paste your data into an LLM chat session, you'll run into problems:

Nondeterminism: Doesn't always respond with the same result
Limited context: Can't pass in all the data at once, need to break it up
Output constraints: Sometimes doesn't return all the data you asked it to review
Scoring subjectivity: Struggles to assign a consistent numeric score to an individual item

siftrank is an implementation of the SiftRank document ranking algorithm that uses LLMs to efficiently find the items in any dataset that are most relevant to a given prompt:

Stochastic: Randomly samples the dataset into small batches.
Inflective: Looks for a natural inflection point in the scores that distinguishes particularly relevant items from the rest.
Fixed: Caps the maximum number of LLM calls so the computational complexity remains linear in the worst case.
Trial: Repeatedly compares batched items until the relevance scores stabilize.

Use any LLM to rank anything. No fine-tuning. No domain-specific models. Just an off-the-shelf model and your ranking prompt. Typically runs in seconds and costs pennies.

Supported Providers

siftrank is provider-agnostic and works with multiple LLM providers:

OpenAI - GPT-4, GPT-4o, GPT-4o-mini (via OPENAI_API_KEY)
Anthropic - Claude Opus, Claude Sonnet, Claude Haiku (via ANTHROPIC_API_KEY)
OpenRouter - Access 200+ models from multiple providers (via OPENROUTER_API_KEY)
Ollama - Local or cloud-hosted models like Llama, Mistral, Qwen (via OLLAMA_API_KEY for cloud, no key for local)
Google - Gemini Pro, Gemini Flash (via GOOGLE_API_KEY)

Select your provider with --provider <name> or use the default (OpenAI). Set the appropriate API key environment variable for your chosen provider.

Choosing a Provider

Provider	Best For	Strengths	Considerations
OpenAI	General use, batch mode	Fast, cost-effective (gpt-4o-mini), batch API (50% savings), widest model range	Requires API key, cloud-only
Anthropic	Complex analysis, nuance	Strong reasoning (Claude Sonnet/Opus), careful instruction following	Higher cost for top-tier models
OpenRouter	Model experimentation	Access 200+ models, single API key, easy model comparison	Adds routing layer, pricing varies by model
Ollama	Privacy, local control	Free local inference, no data leaves your machine, cloud option available	Requires local GPU for good performance, slower than cloud APIs
Google	Google ecosystem	Gemini models, competitive pricing	Smaller model selection for ranking tasks

Quick decision guide:

Just getting started? Use OpenAI with gpt-4o-mini (default). Cheapest cloud option with great results.
Need privacy? Use Ollama with a local model. No data leaves your machine.
Large dataset (1000+ docs)? Use siftrank batch submit with OpenAI for 50% cost savings.
Want the best quality? Use Anthropic with claude-sonnet-4-20250514 or OpenAI with gpt-4o.
Comparing models? Use OpenRouter with --compare to test multiple models through one API key.

Getting started

Install

go install github.com/meganerd/siftrank/cmd/siftrank@latest

Configure

Set the API key for your chosen provider:

# OpenAI (default provider)
export OPENAI_API_KEY="sk-..."

# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenRouter
export OPENROUTER_API_KEY="sk-or-..."

# Google
export GOOGLE_API_KEY="..."

# Ollama (local — no API key needed)
# Ensure Ollama server is running: ollama serve

# Ollama (cloud-hosted — requires API key)
export OLLAMA_API_KEY="..."

Usage

siftrank -h

Options:
  -f, --file string       input file (required)
  -m, --model string      model name (default "gpt-4o-mini")
  -o, --output string     JSON output file
      --pattern string    glob pattern for filtering files in directory (default "*")
  -p, --prompt string     initial prompt (prefix with @ to use a file)
      --provider string   LLM provider: openai, anthropic, openrouter, ollama, google (default "openai")
  -r, --relevance         post-process each item by providing relevance justification (skips round 1)
      --compare string    compare multiple models (format: "provider:model,provider:model")
      --report-cost       print estimated cost summary to stderr after ranking

Visualization:
      --no-minimap   disable minimap panel in watch mode
      --watch        enable live terminal visualization (logs suppressed unless --log is specified)

Debug:
  -d, --debug          enable debug logging
      --dry-run        log API calls without making them
      --log string     write logs to file instead of stderr
      --trace string   trace file path for streaming trial execution state (JSON Lines format)

Advanced:
  -u, --base-url string         custom API base URL (for OpenAI-compatible APIs like vLLM)
  -b, --batch-size int          number of items per batch (default 10)
  -c, --concurrency int         max concurrent LLM calls across all trials (default 50)
  -e, --effort string           reasoning effort level: none, minimal, low, medium, high
      --elbow-method string     elbow detection method: curvature (default), perpendicular (default "curvature")
      --elbow-tolerance float   elbow position tolerance (0.05 = 5%) (default 0.05)
      --encoding string         tokenizer encoding (default "o200k_base")
      --json                    force JSON parsing regardless of file extension
      --max-trials int          maximum number of ranking trials (default 50)
      --min-trials int          minimum trials before checking convergence (default 5)
      --no-converge             disable early stopping based on convergence
      --ratio float             refinement ratio (0.0-1.0, e.g. 0.5 = top 50%) (default 0.5)
      --stable-trials int       stable trials required for convergence (default 5)
      --template string         template for each object (prefix with @ to use a file) (default "{{.Data}}")
      --tokens int              max tokens per batch (includes prompt + documents) (default 128000)

Flags:
  -h, --help   help for siftrank

Quick Example

Compares 100 sentences in 7 seconds using the default provider (OpenAI):

siftrank \
    -f testdata/sentences.txt \
    -p 'Rank each of these items according to their relevancy to the concept of "time".' |
    jq -r '.[:10] | map(.value)[]' |
    nl

   1  The train arrived exactly on time.
   2  The old clock chimed twelve times.
   3  The clock ticked steadily on the wall.
   4  The bell rang, signaling the end of class.
   5  The rooster crowed at the break of dawn.
   6  She climbed to the top of the hill to watch the sunset.
   7  He watched as the leaves fell one by one.
   8  The stars twinkled brightly in the clear night sky.
   9  He spotted a shooting star while stargazing.
  10  She opened the curtains to let in the morning light.

Understanding the Output

siftrank outputs a JSON array of ranked documents, sorted by score (lower = better):

[
  {
    "key": "eQJpm-Qs",
    "value": "The train arrived exactly on time.",
    "score": 1.5,
    "rank": 1,
    "input_index": 0
  },
  {
    "key": "SyJ3d9Td",
    "value": "The old clock chimed twelve times.",
    "score": 2.3,
    "rank": 2,
    "input_index": 5
  }
]

Field	Description
`key`	Deterministic short ID for the document
`value`	The document text (or rendered template output)
`score`	Average positional score across trials (lower = more relevant)
`rank`	Final rank position (1 = best match)
`input_index`	Original position in the input file (0-indexed)

Common output recipes:

# Top 5 values only
siftrank -f data.txt -p 'Rank by relevance' | jq -r '.[:5] | map(.value)[]'

# Top 10 as numbered list
siftrank -f data.txt -p 'Rank by quality' | jq -r '.[:10] | map(.value)[]' | nl

# Save full results to file
siftrank -f data.txt -p 'Rank by importance' -o results.json

# Get built-in cost report
siftrank -f data.txt -p 'Rank by priority' --report-cost

The --report-cost flag prints a cost summary to stderr after ranking:

--- Cost Report ---
Model:         gpt-4o-mini
Input tokens:  48250
Output tokens: 9830
Estimated cost: $0.013125
-------------------

Use a different provider by specifying --provider and --model:

# Use Anthropic's Claude Sonnet
siftrank \
    --provider anthropic \
    --model claude-sonnet-4-20250514 \
    -f testdata/sentences.txt \
    -p 'Rank by relevancy to "time".'

# Use Ollama with a local model
siftrank \
    --provider ollama \
    --model llama3.3 \
    -f testdata/sentences.txt \
    -p 'Rank by relevancy to "time".'

Multi-Provider Examples

Examples demonstrating different providers and use cases.

OpenAI

Basic ranking with gpt-4o-mini (default):

siftrank \
    -f logs/access.log \
    -p 'Find suspicious requests that might indicate an attack.' \
    -o suspicious_requests.json

Using GPT-4o for complex analysis:

siftrank \
    --provider openai \
    --model gpt-4o \
    -f cve_descriptions.txt \
    -p 'Rank vulnerabilities by exploitability and impact.'

With reasoning effort (o1/o3 models):

siftrank \
    --provider openai \
    --model o1-mini \
    --effort medium \
    -f security_findings.json \
    -p 'Prioritize findings by severity and likelihood of exploitation.'

Anthropic

Claude Sonnet for balanced performance:

siftrank \
    --provider anthropic \
    --model claude-sonnet-4-20250514 \
    -f research_papers.json \
    -p 'Rank papers by relevance to LLM security.' \
    --trace anthropic_trace.jsonl

Claude Haiku for fast, cost-effective ranking:

siftrank \
    --provider anthropic \
    --model claude-haiku-4-20250514 \
    -f user_feedback.txt \
    -p 'Identify feedback indicating bugs or usability issues.' \
    --watch

Claude Opus for highest quality analysis:

siftrank \
    --provider anthropic \
    --model claude-opus-4-20250514 \
    -f threat_intelligence.json \
    -p 'Rank threats by sophistication and potential impact to our infrastructure.'

OpenRouter

Access multiple providers through one API:

# Set OpenRouter API key
export OPENROUTER_API_KEY="sk-or-..."

# Use any model from OpenRouter's catalog
siftrank \
    --provider openrouter \
    --model anthropic/claude-sonnet-4 \
    -f documents.txt \
    -p 'Find documents related to incident response.'

Compare frontier models:

siftrank \
    --provider openrouter \
    --model google/gemini-2.0-flash-exp \
    -f code_review.json \
    -p 'Identify security vulnerabilities in this code.' \
    --compare "openrouter:anthropic/claude-sonnet-4,openrouter:openai/gpt-4o"

Ollama (Local & Cloud)

siftrank supports both local and cloud-hosted Ollama instances. Local instances require no API key; cloud instances authenticate via OLLAMA_API_KEY using Bearer token auth.

Authentication precedence: --api-key flag > OLLAMA_API_KEY env var > config api_keys.ollama > no auth (local).

Run completely local with Llama:

# Ensure Ollama is running: ollama serve
# Pull model if needed: ollama pull llama3.3

siftrank \
    --provider ollama \
    --model llama3.3 \
    --base-url http://localhost:11434 \
    -f sensitive_data.txt \
    -p 'Identify PII that needs redaction.' \
    -o redaction_candidates.json

Use local model with custom Ollama server:

siftrank \
    --provider ollama \
    --model qwen2.5-coder:7b \
    --base-url http://gpu-server:11434 \
    -f code_snippets.txt \
    -p 'Rank code by complexity and maintainability.'

Cloud-hosted Ollama instance:

# Set API key for cloud-hosted Ollama
export OLLAMA_API_KEY="your-cloud-api-key"

siftrank \
    --provider ollama \
    --model llama3.3 \
    --base-url https://ollama.example.com \
    -f documents.txt \
    -p 'Rank by relevance to security compliance.'

Using a config file for cloud Ollama:

# ~/.config/siftrank/config.yaml
provider: ollama
model: llama3.3
base_url: https://ollama.example.com
api_keys:
  ollama: your-cloud-api-key

# With config file, no flags needed:
siftrank -f documents.txt -p 'Rank by relevance.'

Local model for privacy-sensitive ranking:

siftrank \
    --provider ollama \
    --model mistral:7b-instruct \
    --base-url http://localhost:11434 \
    -f employee_reviews.txt \
    -p 'Identify reviews mentioning management concerns.' \
    --no-converge \
    --max-trials 10

Model Comparison

Compare cost vs performance:

# Fast model vs quality model
siftrank \
    -f large_dataset.json \
    -p 'Rank by business value.' \
    --compare "openai:gpt-4o-mini,openai:gpt-4o" \
    --trace comparison_cost_quality.jsonl

Compare across providers:

# OpenAI vs Anthropic vs local
siftrank \
    -f documents.txt \
    -p 'Find documents about security best practices.' \
    --compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514,ollama:llama3.3" \
    --trace multi_provider_comparison.jsonl

# Analyze results
jq -s 'group_by(.model) | map({
    model: .[0].model,
    calls: length,
    avg_latency: (map(.latency_ms) | add / length),
    total_tokens: (map(.input_tokens + .output_tokens) | add)
})' multi_provider_comparison.jsonl

Compare OpenRouter models:

siftrank \
    -f research_questions.txt \
    -p 'Prioritize research questions by impact.' \
    --compare "openrouter:anthropic/claude-sonnet-4,openrouter:google/gemini-2.0-flash-exp,openrouter:meta-llama/llama-3.3-70b-instruct" \
    --trace openrouter_comparison.jsonl

New Features Showcase

Recent enhancements to siftrank enable advanced workflows for large-scale ranking tasks.

Directory Input with Glob Patterns

Process multiple files from a directory with optional pattern filtering:

# Process all JSON files in a directory
siftrank \
    -f ./data \
    --pattern "*.json" \
    -p 'Rank by importance' \
    -o results.json

# Process log files matching a pattern
siftrank \
    -f ./logs \
    --pattern "error_*.log" \
    -p 'Find critical errors that need immediate attention.' \
    --watch

Features:

Non-recursive - Only processes files in the specified directory (not subdirectories)
Glob filtering - Use patterns like *.txt, data_*.json, or report_[0-9]*.log
Aggregated ranking - All documents from matching files are ranked together as a single dataset
Sorted enumeration - Files are processed in deterministic alphabetical order

Security: Directory traversal (..) is blocked. Resource limits apply (1000 files per directory, 10000 documents total).

Convergence Detection and Early Stopping

Automatically stop ranking when results stabilize, saving time and API costs:

# Enable convergence detection (default behavior)
siftrank \
    -f data.txt \
    -p 'Rank by quality' \
    --min-trials 5 \
    --stable-trials 5

How it works:

Elbow detection - Identifies the inflection point where scores plateau
Stability tracking - Waits for N consecutive trials with consistent elbow position
Early exit - Stops as soon as convergence criteria are met

Configuration:

# Disable convergence for fixed trial count
siftrank -f data.txt -p 'Rank' --no-converge --max-trials 20

# Adjust convergence sensitivity
siftrank -f data.txt -p 'Rank' \
    --min-trials 3 \
    --stable-trials 7 \
    --elbow-tolerance 0.10

Flag	Default	Description
`--min-trials`	5	Minimum trials before checking convergence
`--stable-trials`	5	Consecutive stable trials required
`--elbow-tolerance`	0.05	Tolerance for elbow position stability (5%)
`--no-converge`	false	Disable early stopping

Typical savings: 40-60% reduction in API calls for datasets with clear ranking signal.

Elbow Detection Methods

Choose between two elbow detection algorithms:

# Curvature-based detection (default)
siftrank -f data.txt -p 'Rank' --elbow-method curvature

# Perpendicular distance detection
siftrank -f data.txt -p 'Rank' --elbow-method perpendicular

Methods:

Curvature (default) - Finds maximum curvature in the score curve. Best for smooth, exponential-like distributions.
Perpendicular - Maximizes perpendicular distance from line connecting first and last points. Best for linear-then-flat distributions.

When to switch methods:

Use curvature for most cases - works well with typical ranking distributions
Use perpendicular if curvature fails to detect an obvious inflection point
Compare both with --trace and visual inspection

Watch Mode Visualization

Monitor ranking progress in real-time with terminal-based visualization:

# Enable watch mode
siftrank -f data.txt -p 'Rank by priority' --watch

# Watch mode without minimap (larger chart)
siftrank -f data.txt -p 'Rank' --watch --no-minimap

Display panels:

Score chart - Real-time convergence visualization with elbow marker
Minimap - Overview of full score distribution (disable with --no-minimap)
Statistics - Trial count, convergence status, API call count
Top items - Live preview of current top-ranked results

Note: Watch mode suppresses log output by default. Use --log <file> to capture logs while watching.

Relevance Justification Mode

Generate structured explanations for each ranked item:

# Add pros/cons for each result
siftrank \
    -f data.txt \
    -p 'Rank security vulnerabilities by severity' \
    --relevance \
    -o results.json

Output format (with --relevance):

{
  "key": "abc123",
  "value": "SQL injection in login form",
  "score": 0,
  "rank": 1,
  "justification": {
    "pros": [
      "Direct database access",
      "Authentication bypass potential",
      "High exploitability"
    ],
    "cons": [
      "Requires network access",
      "May be mitigated by WAF"
    ]
  }
}

Use cases:

Decision support - Understand why items ranked high/low
Quality assurance - Validate LLM reasoning
Reporting - Generate audit trails with explanations

Note: Relevance mode skips initial trial round (jumps directly to justification), so use with a reasonable --max-trials limit.

Batch Mode (OpenAI Batch API)

Submit large ranking jobs to the OpenAI Batch API for 50% cost savings. Batch jobs complete within 24 hours — ideal for large, cost-sensitive datasets where real-time results are not required.

Subcommands:

siftrank batch submit   Submit a batch ranking job
siftrank batch status   Check the status of a batch job
siftrank batch results  Download and process results from a completed batch job

End-to-end workflow:

# Step 1: Submit a batch job
siftrank batch submit \
    -f documents.txt \
    -p 'Rank by business value' \
    -m gpt-4o-mini \
    -o ./output

# Output:
#   Loaded 500 documents from documents.txt
#   Generated 50 batch requests
#   Uploaded batch file: file-abc123
#   Created batch: batch_xyz789
#   Mapping file: ./output/.siftrank-batch.json
#
#   Check status:
#     siftrank batch status batch_xyz789
#
#   Get results when complete:
#     siftrank batch results ./output/.siftrank-batch.json

# Step 2: Check status (repeat until "completed")
siftrank batch status batch_xyz789

# Output:
#   Batch Status
#   ============
#   ID:        batch_xyz789
#   Status:    completed
#   Request Counts
#     Total:     50
#     Completed: 50
#     Failed:    0

# Step 3: Download and process results
siftrank batch results ./output/.siftrank-batch.json > ranked_output.json

Submit flags:

Flag	Default	Description
`-f, --file`	(required)	Input file with documents (one per line)
`-p, --prompt`	(required)	Ranking prompt (prefix with `@` to use a file)
`-m, --model`	`gpt-4o-mini`	OpenAI model name
`-b, --batch-size`	`10`	Documents per batch request
`-o, --output-dir`	`.`	Directory for mapping file output

How it works:

Documents are split into batches and formatted as JSONL for the OpenAI Batch API
A mapping file (.siftrank-batch.json) is saved to correlate results back to input documents
Results are scored by average position across batches (same scoring as real-time mode)

Note: Batch mode only supports the OpenAI provider. Use the standard siftrank command for other providers.

Trace File Monitoring

Stream execution state to a file for analysis and debugging:

# Basic trace
siftrank -f data.txt -p 'Rank' --trace trace.jsonl

# Monitor in real-time
siftrank -f data.txt -p 'Rank' --trace trace.jsonl &
tail -f trace.jsonl | jq

Trace file contents (JSON Lines format):

{"trial":1,"round":1,"model":"gpt-4o-mini","input_tokens":1234,"output_tokens":567,"latency_ms":850}
{"trial":1,"round":2,"model":"gpt-4o-mini","input_tokens":1156,"output_tokens":489,"latency_ms":790}
{"trial":2,"round":1,"model":"gpt-4o-mini","input_tokens":1234,"output_tokens":602,"latency_ms":820}

Analysis examples:

# Calculate total cost
jq -s 'map(.input_tokens + .output_tokens) | add' trace.jsonl

# Latency percentiles
jq -s 'map(.latency_ms) | sort | .[length*0.95 | floor]' trace.jsonl

# Success rate
jq -s 'map(.success) | add / length * 100' trace.jsonl

With --compare:

siftrank \
    -f data.txt \
    -p 'Rank' \
    --compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514" \
    --trace comparison.jsonl

# Compare model performance
jq -s 'group_by(.model) | map({
    model: .[0].model,
    calls: length,
    avg_latency: (map(.latency_ms) | add / length),
    total_cost: (map(.input_tokens + .output_tokens) | add)
})' comparison.jsonl

Advanced usage

JSON support

If the input file is a JSON document, it will be read as an array of objects and each object will be used for ranking.

For instance, two objects would be loaded and ranked from this document:

[
  {
    "path": "/foo",
    "code": "bar"
  },
  {
    "path": "/baz",
    "code": "nope"
  }
]

Templates

It is possible to include each element from the input file in a template using the Go template syntax via the --template "template string" (or --template @file.tpl) argument.

For text input files, each line can be referenced in the template with the Data variable:

Anything you want with {{ .Data }}

For JSON input files, each object in the array can be referenced directly. For instance, elements of the previous JSON example can be referenced in the template code like so:

# {{ .path }}

{{ .code }}

Note in the following example that the resulting value key contains the actual value being presented for ranking (as described by the template), while the object key contains the entire original object from the input file for easy reference.

# Create some test JSON data.
seq 9 |
    paste -d @ - - - |
    parallel 'echo {} | tr @ "\n" | jo -a | jo nums=:/dev/stdin' |
    jo -a |
    tee input.json

[{"nums":[1,2,3]},{"nums":[4,5,6]},{"nums":[7,8,9]}]

# Use template to extract the first element of the nums array in each input object.
siftrank \
	-f input.json \
	-p 'Which is biggest?' \
	--template '{{ index .nums 0 }}' \
	--max-trials 1 |
	jq -c '.[]'

{"key":"eQJpm-Qs","value":"7","object":{"nums":[7,8,9]},"score":0,"exposure":1,"rank":1}
{"key":"SyJ3d9Td","value":"4","object":{"nums":[4,5,6]},"score":2,"exposure":1,"rank":2}
{"key":"a4ayc_80","value":"1","object":{"nums":[1,2,3]},"score":3,"exposure":1,"rank":3}

Token Usage and Performance Tracking

siftrank tracks token consumption and performance metrics for all LLM calls, enabling cost estimation and model comparison.

Token Tracking

Every LLM API call records:

Input tokens (prompt tokens)
Output tokens (completion tokens)
Reasoning tokens (for o1/o3 models)

Token usage accumulates across all trials and is included in the trace file (see --trace flag).

Note: The --tokens budget applies to each batch as a whole, including the ranking prompt and document content combined. When adjusting --tokens, account for your prompt length — larger prompts leave less room for documents per batch.

Model Comparison with --compare

Compare multiple models side-by-side to evaluate performance and cost tradeoffs:

# Compare OpenAI vs Anthropic
siftrank \
    -f testdata/sentences.txt \
    -p 'Rank by relevancy to "time".' \
    --compare "openai:gpt-4o-mini,anthropic:claude-haiku-4-20250514" \
    --trace comparison.jsonl

# Compare multiple OpenRouter models
siftrank \
    -f testdata/sentences.txt \
    -p 'Rank by relevancy to "time".' \
    --compare "openrouter:anthropic/claude-sonnet-4,openrouter:openai/gpt-4o" \
    --trace comparison.jsonl

Collected metrics per model:

Call count - Total number of API calls
Success rate - Ratio of successful vs failed calls
Latency statistics - Average, P50, P95, P99 (milliseconds)
Total tokens - Sum of all input + output + reasoning tokens across all calls

Trace File Format

The --trace <file> flag writes JSON Lines output with detailed execution state:

siftrank -f data.txt -p 'Rank items' --trace trace.jsonl

Each line in the trace file contains:

{
  "trial": 1,
  "round": 2,
  "model": "gpt-4o-mini",
  "batch_size": 10,
  "input_tokens": 1234,
  "output_tokens": 567,
  "reasoning_tokens": 0,
  "latency_ms": 850,
  "success": true,
  "elbow_detected": false
}

Use the trace file to:

Monitor progress in real-time (tail -f trace.jsonl)
Analyze token consumption patterns across trials
Compare model performance when using --compare
Debug convergence behavior with elbow detection data

Cost Estimation

To estimate costs from token usage:

Extract token totals from trace file:

jq -s 'map({model, input: .input_tokens, output: .output_tokens}) | group_by(.model) | map({model: .[0].model, total_input: (map(.input) | add), total_output: (map(.output) | add)})' trace.jsonl

Apply provider pricing (as of 2026-02):

Provider	Model	Input (per 1M tokens)	Output (per 1M tokens)
OpenAI	gpt-4o-mini	$0.15	$0.60
OpenAI	gpt-4o	$2.50	$10.00
Anthropic	claude-haiku-4	$0.25	$1.25
Anthropic	claude-sonnet-4	$3.00	$15.00
OpenRouter	varies	varies	varies
Ollama (local)	free	$0.00	$0.00
Ollama (cloud)	varies by provider	varies	varies

Example cost calculation:

Input tokens: 50,000
Output tokens: 10,000
Model: gpt-4o-mini

Cost = (50,000 / 1,000,000) × $0.15 + (10,000 / 1,000,000) × $0.60
     = $0.0075 + $0.0060
     = $0.0135 (~1.4 cents)

Tip: Use --report-cost for built-in cost reporting. It prints a cost summary (model, tokens, estimated cost in USD) to stderr after ranking completes.

Back matter

Acknowledgements

This project is a fork and significant evolution of Raink, originally created by noperator at Bishop Fox. The original Raink prototype introduced the core SiftRank algorithm and demonstrated LLM-based document ranking for security research. See the original presentation, blog post, and CLI tool.

This fork (siftrank.meganerd) represents a substantial rewrite with:

Provider-agnostic architecture - Support for OpenAI, Anthropic, OpenRouter, Ollama, and Google (upstream: OpenAI only)
Production-grade reliability - Comprehensive error handling, resource limits, security hardening
Advanced features - Convergence detection, directory input, watch mode visualization, trace monitoring, cost tracking
Model comparison - Side-by-side evaluation across providers with performance metrics
Extensive documentation - Multi-provider examples, practical use cases, cost estimation guidance

While building on the foundational algorithm from Raink, this implementation diverges significantly in architecture, capabilities, and scope. Both projects share the goal of making LLM-powered document ranking accessible and practical.

To-do

add python bindings?
factor LLM calls out into a separate package
account for reasoning tokens separately

Completed

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
.beads		.beads
.github		.github
cmd/siftrank		cmd/siftrank
docs		docs
img		img
pkg/siftrank		pkg/siftrank
pysiftrank		pysiftrank
testdata		testdata
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RESEARCH-TOCTOU-MITIGATION.md		RESEARCH-TOCTOU-MITIGATION.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum

License

meganerd/siftrank

Folders and files

Latest commit

History

Repository files navigation

Description

Supported Providers

Choosing a Provider

Getting started

Install

Configure

Usage

Quick Example

Understanding the Output

Multi-Provider Examples

OpenAI

Anthropic

OpenRouter

Ollama (Local & Cloud)

Model Comparison

New Features Showcase

Directory Input with Glob Patterns

Convergence Detection and Early Stopping

Elbow Detection Methods

Watch Mode Visualization

Relevance Justification Mode

Batch Mode (OpenAI Batch API)

Trace File Monitoring

JSON support

Templates

Token Usage and Performance Tracking

Token Tracking

Model Comparison with --compare

Trace File Format

Cost Estimation

Back matter

Acknowledgements

See also

To-do

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages