A fast, lightweight CLI tool for macOS and Linux that launches a simple web interface for manually classifying files or CSV rows.
Built with Bun, TypeScript, and vanilla web technologies for maximum performance and simplicity.
When working with a collection of files—or rows of data in a CSV—it’s often necessary to review and categorize each item. Doing this in a spreadsheet or directly on the command line can be inconvenient, particularly when fields contain long text, markdown, or other formatting that’s easier to read in a browser.
file-classifier provides a streamlined workflow:
- Run the CLI command.
- A small web UI opens instantly in your browser.
- Review files or CSV rows quickly in a clean, readable format.
- Apply labels, categories, or notes with minimal friction.
The goal is speed and simplicity: an interface that gets out of your way so you can focus on classification, not tooling.
- 🚀 Blazing Fast: Built with Bun runtime, starts in milliseconds
- 📱 Responsive Web UI: Clean, keyboard-driven interface
- 💾 Auto-Save: Classifications persist automatically across sessions
- 🔧 Configurable Categories: Support for 1-9 custom categories
- 📄 Multiple Formats: Works with text files and CSV data
- ⌨️ Rich Keyboard Shortcuts: Navigation, classification, comments, and help
- 📝 Markdown Rendering: Toggle markdown display mode (Cmd+M)
- 💬 Comments System: Add notes to items with keyboard shortcuts (M key)
- 🎯 Zero Dependencies: Self-contained ~60MB binaries
- 🔄 Session Resume: Automatically restores previous work
- 🔍 CSV Column Selection: Focus on specific columns with --columns
If you have Bun installed, you can use the automated install script that detects your platform and builds the appropriate binary:
# Clone the repository
git clone https://github.com/thejud/file-classifier.git
cd file-classifier
# Install to ~/.local/bin (no sudo required)
INSTALL_DIR=~/.local/bin bun run install:local
# Or install to /usr/local/bin (requires sudo)
sudo bun run install:local
# Uninstall
bun run uninstall:localThe install script:
- Auto-detects your OS (macOS/Linux) and architecture (x64/ARM64)
- Builds the correct binary for your platform
- Handles permissions and PATH configuration
- Supports custom installation directories with
--dir
Download the appropriate binary from the releases page:
- Apple Silicon (M1/M2/M3):
file-classifier-macos-arm64 - Intel Macs:
file-classifier-macos-intel
# Download and install (example for Apple Silicon)
chmod +x file-classifier-macos-arm64
sudo mv file-classifier-macos-arm64 /usr/local/bin/file-classifier- Linux x64:
file-classifier-linux-x64 - Linux ARM64:
file-classifier-linux-arm64
# Download and install (example for x64)
chmod +x file-classifier-linux-x64
sudo mv file-classifier-linux-x64 /usr/local/bin/file-classifierNote: All binaries are fully standalone and don't require Bun or Node.js on the target system.
Classify text files line by line:
file-classifier document.txt report.txtClassify CSV rows as individual items:
file-classifier --csv data.csvShow only specific columns from CSV files:
file-classifier --csv --columns "title,content,priority" data.csvBy default, the classifier uses three categories: good (1), bad (2), and review (3). The keyboard shortcuts correspond to the order you specify, not alphabetical order.
Define your own classification categories:
file-classifier --categories "urgent,normal,low" emails.txt
file-classifier --csv --categories "spam,ham,unsure" messages.csvUsage: file-classifier [options] <files...>
Options:
--csv, -c CSV mode (otherwise file mode)
--categories "cat1,cat2" Custom categories (comma-separated, 1-9 max)
--columns "col1,col2" CSV column subset (default: all columns)
--port <number> Specific port (default: random)
--no-browser Don't auto-launch browser
--reset Clear previous classifications and comments for specified files
Examples:
file-classifier file1.txt file2.txt
file-classifier --csv data.csv
file-classifier --csv --columns "Detection Name,uuid,message" data.csv
file-classifier --categories "spam,ham,unsure" *.txt
file-classifier -c "bug,feature,question" --csv data.csv
file-classifier --reset file1.txt file2.txt
To clear previous classifications and comments for specific files:
# Clear classification data for specific files
file-classifier --reset file1.txt file2.txt
# Clear CSV data classifications
file-classifier --reset --csv data.csv
# Clear with specific categories (must match original session)
file-classifier --reset --categories "spam,ham,unsure" *.txtClassification data is stored in ~/.config/file-classifier/ as session files (following XDG Base Directory Specification). Each unique combination of files, categories, and mode creates a separate session. The --reset option only clears data for the exact session that matches your specified files and options, leaving other sessions intact.
- 1-9: Classify current item with category 1-9
- ←/→ or J/K: Navigate between items (or click Prev/Next)
- Shift+←/→ or Shift+J/K: Navigate to unrated items only
- Space: Skip to next item without classifying
- Cmd+M (or Ctrl+M): Toggle markdown rendering mode
- Cmd+E (or Ctrl+E): Export results to JSON file
- M: Add/edit comment for current item
- Shift+Enter: Save comment (when in comment modal)
- ? or /: Show/hide keyboard shortcut help
- Escape: Close any open modal
The classifier excels at reviewing security alerts and incidents with rich markdown formatting:
Reviewing a suspicious login alert with markdown-formatted risk assessment
Toggle between raw and rendered markdown views for better readability:
Markdown rendering mode shows formatted content with proper styling
Comprehensive keyboard shortcuts for efficient classification:
Press ? to view all available keyboard shortcuts
The tool provides a clean, keyboard-driven interface optimized for rapid classification:
Classifications are automatically saved and restored when you restart with the same files and categories. This allows you to:
- Stop and resume work anytime
- Refresh the browser without losing progress
- Recover from crashes or interruptions
Results are exported as JSON with this structure:
{
"sessionId": "session-1234567890",
"config": {
"mode": "csv",
"categories": ["spam", "ham", "unsure"],
"sources": ["messages.csv"]
},
"classifications": [
{
"itemId": "messages.csv:row:1",
"category": 2,
"categoryName": "ham",
"timestamp": "2024-01-15T10:30:00.000Z"
}
],
"summary": {
"totalItems": 100,
"classifiedItems": 75,
"unclassifiedItems": 25,
"categoryCounts": {
"spam": 20,
"ham": 50,
"unsure": 5
}
},
"exportedAt": "2024-01-15T11:00:00.000Z"
}- Each file is treated as one classification item
- Displayed with syntax highlighting and line numbers
- Supports any text-based format (.txt, .md, .log, etc.)
- Each row becomes a classification item
- Displayed as key-value pairs for easy reading
- Automatic parsing with quote handling
- Header row used for column names
Built with modern web technologies:
- Runtime: Bun (blazing fast JavaScript runtime)
- Language: TypeScript (type safety)
- Frontend: Vanilla HTML/CSS/JS (no framework bloat)
- Testing: Bun test + Playwright E2E
- Packaging: Single executable binaries
# Clone and install dependencies
git clone <repository>
cd file-classifier
bun installFor Apple Silicon Macs (M1/M2/M3) - most common:
# Run directly with TypeScript files
bun src/index.ts file1.txt file2.txt
bun src/index.ts --csv data.csv
bun src/index.ts --categories "spam,ham,unsure" --csv messages.csvFor Intel Macs:
# Same commands work
bun src/index.ts --csv --columns "title,content" data.csv# Using the dev script (opens with sample data)
bun run dev file.txt
# Run tests
bun run test # Unit tests only (*.test.ts)
bun run test:e2e # E2E tests with Playwright (*.spec.ts)
bun run test:all # Run both test suites
# Note: Use 'bun run test' instead of 'bun test' directly
# 'bun test' auto-discovers all files including E2E tests which require Playwright
# Build standalone binaries for macOS
bun run build # Apple Silicon (ARM64)
bun run build:intel # Intel Mac
# Cross-compile for Linux (from macOS)
bun build --compile --target=bun-linux-x64 src/cli.ts --outfile=file-classifier-linux
bun build --compile --target=bun-linux-arm64 src/cli.ts --outfile=file-classifier-linux-arm64
# Cross-compile with optimizations
bun build --compile --target=bun-linux-x64 --minify --bytecode src/cli.ts --outfile=file-classifier-linux- Email Classification: Sort emails into spam/ham/unsure
- Document Triage: Categorize documents by priority/topic
- Content Moderation: Review and classify user-generated content
- Data Labeling: Create training datasets for ML models
- Bug Triage: Classify bug reports by severity/type
- Research Data: Categorize survey responses or research data
- Startup Time: < 100ms
- Memory Usage: ~20MB base + data size
- File Handling: Tested with files up to 10MB
- Browser Compatibility: Modern browsers (Chrome, Safari, Firefox)
- Keyboard Responsiveness: Sub-100ms classification
- Operating System:
- macOS 10.15+ (Catalina or newer)
- Linux (x64 or ARM64)
- Architecture: Intel x64 or ARM64
- Browser: Any modern browser for the web interface
- Disk Space: 100MB for binaries + session storage
- Report issues: [GitHub Issues]
- Documentation: This README
- Examples: See
tests/fixtures/for sample data formats
Built with ❤️ for rapid manual data classification