A professional solution for realtime audio translation in streaming/webinar environments. Stream in any language with live captions in another language, powered by GPU-accelerated Whisper ASR.
Perfect for:
- 🎥 International streaming and webinars
- 🎬 Video production with multilingual audiences
- 📺 Live events with real-time translation
- 🎓 Educational content with language barriers
Platforms: Windows (CUDA), macOS (Apple Silicon), Linux
INSTALLER.bat # One-click setup
SubAI_Start.bat # Double-click to run# Install with MLX support for 3-5x faster performance
pip install -e './backend[mlx]'
chmod +x start.sh && ./start.sh
python3 set_model.py mlx large-v3-turbo # Use MLX acceleration- Open Control Panel at http://127.0.0.1:3000
- Go to ⚙️ Settings → Set pipeline to
whisper - Select your model, configure languages
- 🎙️ Stream tab → Connect microphone
- Add OBS overlay from 📊 Monitor tab
📖 Full Guides: Windows Setup | Mac Setup | MLX Setup (Apple Silicon) | Production Deployment
- ✅ GPU-accelerated Whisper ASR (CUDA on Windows/Linux, MLX on Apple Silicon, CPU fallback)
- ✅ MLX optimization - 3-5x faster on M1/M2/M3/M4 Macs
- ✅ 99 languages supported - Real-time translation between any Whisper-supported languages
- ✅ 6+ model sizes - From tiny (40MB) to large-v3 (1.5GB) with live model switching
- ✅ OBS Browser Source overlay - Transparent captions for streaming
- ✅ Modern React control UI - Visual model/language switching, real-time monitoring
- ✅ Production launchers - Double-click
.bat(Windows) or.sh(Mac) to start everything - ✅ System tray application - Professional background operation with menu
- ✅ One-click installer - Automated setup for Windows
- ✅ Session logging - Record all transcriptions to JSON with timestamps
- ✅ Live audio monitoring - Visual audio levels and caption preview
- ✅ WebSocket streaming - Low-latency audio and caption delivery
- ✅ Model switching - Change models on-the-fly via UI, script, or API
- ✅ Cross-platform - Windows (CUDA), macOS (Apple Silicon), Linux
- ✅ Pluggable pipeline architecture - Easy to add new ASR/translation backends
- ✅ REST API - Full programmatic control
- ✅ MLX support - 3-5x faster on Apple Silicon (M1/M2/M3/M4)
┌─────────────────┐ ┌──────────────────────┐
│ Computer 1 │ │ Computer 2 (GPU) │
│ (OBS PC) │ │ │
│ │ │ │
│ ┌───────────┐ │ Audio (WS) │ ┌────────────────┐ │
│ │ Browser │──┼──────────────────┼─→│ FastAPI Server │ │
│ │ Sender │ │ │ │ │ │
│ └───────────┘ │ │ │ ┌──────────┐ │ │
│ │ │ │ │ Whisper │ │ │
│ ┌───────────┐ │ Captions (WS) │ │ │ Pipeline │ │ │
│ │ OBS │←─┼──────────────────┼──│ └──────────┘ │ │
│ │ Overlay │ │ │ │ │ │
│ └───────────┘ │ │ └────────────────┘ │
└─────────────────┘ └──────────────────────┘
- GPU: NVIDIA RTX 20xx+ with 8GB+ VRAM (tested on RTX 5090)
- OS: Windows 10/11, Linux, or macOS with Apple Silicon (M1/M2/M3/M4)
- Python: 3.10 or 3.11
- CUDA: 12.1+ drivers (Windows/Linux with NVIDIA GPU)
- RAM: 16GB+ recommended
- Storage: 5GB for models (can use smaller models on Mac)
- OBS Studio 28+
- Modern browser (Chrome/Edge/Firefox)
Running on macOS with Apple Silicon? See MAC_SETUP.md for Mac-specific instructions.
# Quick start for Mac:
chmod +x start.sh
./start.sh
# Switch to a smaller model for better performance:
python3 set_model.py small-
Clone and install dependencies
git clone https://github.com/arvindjuneja/subai.git SubAI cd SubAI python -m venv .venv .venv\Scripts\Activate.ps1 # Windows PowerShell # or: source .venv/bin/activate # Linux pip install -e ./backend pip install torch --index-url https://download.pytorch.org/whl/cu121
-
Pre-download Whisper models (optional but recommended, ~3GB)
python download_models.py
-
Start the backend server
uvicorn app.main:app --app-dir backend --host 0.0.0.0 --port 8080
Server will be accessible at
http://<GPU_PC_IP>:8080 -
Start the control UI (optional, for config management)
cd frontend npm install npm run devUI accessible at
http://localhost:3000 -
Allow firewall (Windows)
New-NetFirewallRule -DisplayName "SubAI 8080" -Direction Inbound -Protocol TCP -LocalPort 8080 -Action Allow
Close your browser and launch with flags:
# Chrome
"C:\Program Files\Google\Chrome\Application\chrome.exe" --unsafely-treat-insecure-origin-as-secure=http://<GPU_PC_IP>:8080 --user-data-dir="C:\temp\chrome-subai"
# Edge
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --unsafely-treat-insecure-origin-as-secure=http://<GPU_PC_IP>:8080 --user-data-dir="C:\temp\edge-subai"Then open: http://<GPU_PC_IP>:8080/static/sender.html
- Download
sender.htmlfromhttp://<GPU_PC_IP>:8080/static/sender.html - Serve locally:
# Python python -m http.server 9000 # Node npx serve -p 9000 .
- Open:
http://localhost:9000/sender.html?ws=ws://<GPU_PC_IP>:8080/ws/audio
- In OBS: Sources → Add → Browser Source
- URL:
http://<GPU_PC_IP>:8080/overlay - Width: 1920, Height: 1080
- ✅ Enable Shutdown source when not visible
- ✅ Enable Refresh browser when scene becomes active
For video production environments with non-technical users, SubAI includes easy-to-use launchers:
-
Run the installer (one-time setup):
INSTALLER.bat
This automatically:
- Creates Python environment
- Installs all dependencies
- Downloads AI models
- Creates desktop shortcut
-
Users just double-click:
SubAI_Start.bat
- Starts backend + frontend automatically
- Opens browser windows
- System tray icon appears
- Ready to use!
Professional launcher with tray icon and menu:
python launcher_app.pyFeatures:
- Runs in background (no console window)
- Right-click tray menu for quick access
- Auto-opens browser windows
- Clean shutdown
Build a single .exe file for distribution:
python build_launcher.pyCreates dist/SubAI.exe - no Python knowledge required for end users!
See DEPLOYMENT.md for complete production deployment guide.
-
Configure translation (via Control UI at
http://localhost:3000or REST API):curl -X POST http://<GPU_PC_IP>:8080/api/config \ -H "Content-Type: application/json" \ -d '{"source_language":"pl","target_language":"en","pipeline_name":"whisper"}'
-
On Computer 1: Open the audio sender, select your microphone, click Connect
-
On Computer 1: Start OBS with the overlay configured
-
Speak Polish → captions appear in English after ~1-2 seconds
Record your transcriptions for later review. Use the Monitor tab in the Control UI to start/stop recording sessions, or use the API directly:
# Start a logging session
curl -X POST http://<GPU_PC_IP>:8080/api/session/start
# Check session status
curl http://<GPU_PC_IP>:8080/api/session/status
# Stop the session
curl -X POST http://<GPU_PC_IP>:8080/api/session/stopControl UI: Open the React control panel at http://localhost:3000, go to the Monitor tab, and use the "Start Recording Session" button.
Session logs are saved as JSON files in the sessions/ directory with format:
{
"session_id": "abc123",
"start_time": "2025-10-22T14:30:15Z",
"end_time": "2025-10-22T15:45:20Z",
"config": {
"source_language": "pl",
"target_language": "en",
"pipeline": "whisper",
"model": "large-v3"
},
"entries": [
{
"timestamp": "2025-10-22T14:30:20Z",
"text": "Hello everyone",
"processing_time_ms": 234
}
]
}GET /health- Health checkGET /api/config- Get current configPOST /api/config- Update config (JSON body:{source_language, target_language, pipeline_name})GET /api/pipelines- List available pipelinesPOST /api/session/start- Start logging sessionPOST /api/session/stop- Stop logging sessionGET /api/session/status- Get session statusWS /ws/audio- Audio ingest (PCM16LE mono 48kHz)WS /ws/captions- Caption broadcast (JSON:{caption: string})GET /overlay- OBS overlay HTML pageGET /static/sender.html- Browser-based audio sender
SubAI supports multiple Whisper models with different speed/quality tradeoffs:
Using the Helper Script (Easiest):
# List available models
python set_model.py list
# Switch to a specific model
python set_model.py small # Fast, good for Mac/testing
python set_model.py medium # Balanced
python set_model.py large-v3 # Best quality (default)Via API:
curl -X POST http://<GPU_PC_IP>:8080/api/config \
-H "Content-Type: application/json" \
-d '{"model_name":"small"}'Available Models:
tiny- Fastest, 40MB (testing only)small- Fast, 245MB (recommended for Mac)medium- Balanced, 770MB (production ready)large-v3- Best quality, 1.5GB (default for GPU PC)
See QUICK_REFERENCE.md for detailed model comparison.
Whisper supports 99 languages. Common codes:
pl- Polishen- Englishde- Germanfr- Frenches- Spanishit- Italianja- Japanesezh- Chinese
For translation, set source_language and target_language to different codes.
SubAI offers three pipelines optimized for different platforms and use cases:
| Feature | MLX (Apple Silicon) | Faster-Whisper (CUDA) | Faster-Whisper (CPU) |
|---|---|---|---|
| Platform | macOS (M1/M2/M3/M4) | Windows/Linux (NVIDIA GPU) | All platforms |
| Speed | ⚡⚡⚡ Very Fast (2-4s) | ⚡⚡ Fast (3-5s) | 🐢 Slow (8-10s) |
| Quality | ⭐⭐⭐ Good | ⭐⭐⭐⭐⭐ Best | ⭐⭐⭐⭐⭐ Best |
| Decoding | Greedy (no beam search) | Beam search (size=5) | Beam search (size=5) |
| Power Usage | 🔋 Low (30-40% less) | 🔥 High | 🔥 Medium-High |
| Setup | pip install -e './backend[mlx]' |
Standard install + CUDA | Standard install |
| Best For | Development, testing on Mac | Production (best quality) | Intel Macs, no GPU |
| Quality Note | May have occasional odd translations | Most coherent & accurate | Most coherent & accurate |
Production / Important Translations:
- Windows/Linux with NVIDIA GPU: Use
whisper(CUDA) → Best quality - Mac with Apple Silicon: Use
whisper(CPU) → Best quality, slower but worth it - Intel Mac / No GPU: Use
whisper(CPU) → Only option for quality
Development / Testing / Quick Iteration:
- Mac with Apple Silicon: Use
mlx→ 3-5x faster, good enough for testing
Why Quality Differs:
- Beam Search (Faster-Whisper): Explores 5 hypotheses simultaneously → better translations
- Greedy Decoding (MLX): Picks single best choice at each step → faster but less coherent
Placeholder pipeline that emits debug messages every ~1s of audio. Use for testing connectivity.
GPU-accelerated Faster-Whisper pipeline with beam search:
- Quality: Best (beam search with size=5)
- Model:
large-v3(best quality, ~3GB VRAM/RAM) - Alternative models:
medium,small,large-v2 - Acceleration: CTranslate2 with CUDA (NVIDIA GPU) or CPU
- Best for: Production use, Windows with NVIDIA GPU, all platforms when quality matters
MLX-accelerated Whisper pipeline with greedy decoding:
- Speed: 3-5x faster than CPU on M1/M2/M3/M4 Macs
- Quality: Good (greedy decoding, no beam search yet)
- Power: 30-40% lower consumption vs CPU
- Recommended models:
large-v3-mlx,medium-mlx,small-mlx - Best for: Development and testing on Mac, quick iterations
- Limitation: No beam search support yet (lower quality than faster-whisper)
- See: MLX_SETUP.md for detailed setup and comparison
# Install MLX support
pip install -e './backend[mlx]'
# Download model
python3 download_mlx_models.py --model large-v3-mlx
# Use it
python3 set_model.py mlx large-v3-mlx# Already included in standard install
python3 set_model.py whisper large-v3-
GPU not detected?
- Verify NVIDIA drivers:
nvidia-smi - Check CUDA version matches PyTorch
- The pipeline will fall back to CPU (slower but functional)
- Verify NVIDIA drivers:
-
High latency?
- Use
mediummodel instead oflarge-v3for 2x speed - Reduce chunk size in
backend/app/pipeline/whisper_pipeline.py
- Use
-
Model downloads slow?
- Pre-download with
python download_models.py - Models cached in
~/.cache/huggingface/
- Pre-download with
-
OBS overlay not updating?
- Check firewall on GPU PC
- Verify overlay URL in browser first
- Restart OBS browser source (right-click → Refresh)
-
Microphone access blocked?
- Use HTTPS or localhost for sender
- Or launch browser with
--unsafely-treat-insecure-origin-as-secureflag
- Backend not running or firewall blocking port 8080
- Check:
curl http://<GPU_PC_IP>:8080/health
- Check sender is connected (green status)
- Verify pipeline is
whispernotnull - Check backend logs for errors
- Switch to
mediummodel - Close other GPU applications
- Fall back to CPU: edit
whisper_pipeline.pydevice to"cpu"
SubAI/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app
│ │ ├── config.py # Runtime config
│ │ ├── state.py # WebSocket clients manager
│ │ ├── routers/
│ │ │ ├── ingest.py # Audio WS endpoint
│ │ │ ├── overlay.py # Overlay page
│ │ │ └── control.py # Config API
│ │ ├── pipeline/
│ │ │ ├── base.py # Pipeline interface
│ │ │ ├── null_pipeline.py
│ │ │ ├── whisper_pipeline.py
│ │ │ └── registry.py # Pipeline registry
│ │ └── static/
│ │ └── sender.html # Browser audio sender
│ └── pyproject.toml
├── frontend/
│ ├── src/
│ │ ├── App.tsx # Control UI
│ │ └── App.css
│ ├── package.json
│ └── vite.config.ts
├── test_gpu.py # GPU verification script
├── download_models.py # Model pre-download
└── README.md
-
Create
backend/app/pipeline/my_pipeline.py:from .base import TranslationPipeline class MyPipeline(TranslationPipeline): def name(self) -> str: return "my_pipeline" def available_models(self): return ["model1", "model2"] def select_model(self, model_name: str): pass def reset(self): pass def process_chunk(self, pcm16le: bytes, sample_rate_hz: int): # Process audio, return caption or None return "translated text"
-
Register in
backend/app/pipeline/registry.py:from .my_pipeline import MyPipeline _pipelines["my_pipeline"] = MyPipeline()
-
Restart backend
- Requires two computers (or VM setup)
- GPU highly recommended for real-time performance
- WebSocket connections require secure context or browser flags for microphone access
- Translation quality depends on Whisper model size (larger = better but slower)
- MLX support for Apple Silicon - 3-5x faster on M-series Macs ✅
- Single-PC mode with virtual audio routing
- Custom model fine-tuning support
- Real-time voice cloning/dubbing
- Multi-speaker detection
- Cloud deployment option
- Whisper Turbo integration for even faster processing
SubAI includes comprehensive documentation for all users:
- README_USER.txt - Simple guide for non-technical users (5-minute setup)
- MODEL_SWITCHING_GUIDE.txt - How to choose and switch models
- QUICK_REFERENCE.md - Command cheat sheet and quick lookup
- DEPLOYMENT.md - Production deployment guide
- INSTALLER.bat - Windows one-click installer
- build_launcher.py - Build standalone .exe
- MAC_SETUP.md - Complete Mac setup and optimization guide
- MLX_SETUP.md - MLX acceleration guide for Apple Silicon (3-5x faster!)
- start.sh - Mac/Linux startup script
- download_mlx_models.py - Download MLX-optimized models
- FILES_GUIDE.md - Explains all files in the project
- README.md - This file (technical overview)
- API Documentation - See API Endpoints section
- set_model.py - Easy model switching:
python set_model.py small - download_models.py - Pre-download Whisper models
- launcher_app.py - System tray launcher application
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Read the QUICK_REFERENCE.md for common tasks
- Review MAC_SETUP.md for Mac-specific issues
- Open an issue
- Review existing issues for solutions
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
This project is licensed under CC BY-NC-SA 4.0 - see the LICENSE file for details.
You are free to:
- ✅ Use for personal, educational, and non-profit purposes
- ✅ Modify and adapt the code
- ✅ Share and redistribute
Under these conditions:
- 📝 Must give appropriate credit
- 🚫 No commercial use without permission
- 🔄 Must share modifications under the same license
- 🔓 Must keep derivatives open source
Commercial Use: If you're interested in using SubAI commercially, please contact the maintainers to discuss licensing options.
Built with:
- Faster-Whisper - CUDA-accelerated Whisper implementation
- MLX Whisper - Apple Silicon optimized Whisper
- CTranslate2 - Fast neural machine translation inference
- FastAPI - Modern Python web framework
- React - UI control panel
- Vite - Frontend build tool
Made with ❤️ for the streaming community