SubAI - Realtime Audio Translation for OBS

A professional solution for realtime audio translation in streaming/webinar environments. Stream in any language with live captions in another language, powered by GPU-accelerated Whisper ASR.

Perfect for:

🎥 International streaming and webinars
🎬 Video production with multilingual audiences
📺 Live events with real-time translation
🎓 Educational content with language barriers

Platforms: Windows (CUDA), macOS (Apple Silicon), Linux

🚀 Quick Start

Windows (Production Ready)

INSTALLER.bat              # One-click setup
SubAI_Start.bat            # Double-click to run

Mac (Apple Silicon)

# Install with MLX support for 3-5x faster performance
pip install -e './backend[mlx]'
chmod +x start.sh && ./start.sh
python3 set_model.py mlx large-v3-turbo  # Use MLX acceleration

Then

Open Control Panel at http://127.0.0.1:3000
Go to ⚙️ Settings → Set pipeline to whisper
Select your model, configure languages
🎙️ Stream tab → Connect microphone
Add OBS overlay from 📊 Monitor tab

📖 Full Guides: Windows Setup | Mac Setup | MLX Setup (Apple Silicon) | Production Deployment

📑 Table of Contents

Features
Quick Start
System Requirements
Setup
Usage
- Model Switching
- Session Logging
API Reference
Documentation
Troubleshooting

Features

Core Translation

✅ GPU-accelerated Whisper ASR (CUDA on Windows/Linux, MLX on Apple Silicon, CPU fallback)
✅ MLX optimization - 3-5x faster on M1/M2/M3/M4 Macs
✅ 99 languages supported - Real-time translation between any Whisper-supported languages
✅ 6+ model sizes - From tiny (40MB) to large-v3 (1.5GB) with live model switching
✅ OBS Browser Source overlay - Transparent captions for streaming

User Experience

✅ Modern React control UI - Visual model/language switching, real-time monitoring
✅ Production launchers - Double-click .bat (Windows) or .sh (Mac) to start everything
✅ System tray application - Professional background operation with menu
✅ One-click installer - Automated setup for Windows

Professional Features

✅ Session logging - Record all transcriptions to JSON with timestamps
✅ Live audio monitoring - Visual audio levels and caption preview
✅ WebSocket streaming - Low-latency audio and caption delivery
✅ Model switching - Change models on-the-fly via UI, script, or API
✅ Cross-platform - Windows (CUDA), macOS (Apple Silicon), Linux

Developer Friendly

✅ Pluggable pipeline architecture - Easy to add new ASR/translation backends
✅ REST API - Full programmatic control
✅ MLX support - 3-5x faster on Apple Silicon (M1/M2/M3/M4)

Architecture

┌─────────────────┐                  ┌──────────────────────┐
│  Computer 1     │                  │   Computer 2 (GPU)   │
│  (OBS PC)       │                  │                      │
│                 │                  │                      │
│  ┌───────────┐  │   Audio (WS)     │  ┌────────────────┐  │
│  │ Browser   │──┼──────────────────┼─→│ FastAPI Server │  │
│  │ Sender    │  │                  │  │                │  │
│  └───────────┘  │                  │  │  ┌──────────┐  │  │
│                 │                  │  │  │ Whisper  │  │  │
│  ┌───────────┐  │   Captions (WS)  │  │  │ Pipeline │  │  │
│  │   OBS     │←─┼──────────────────┼──│  └──────────┘  │  │
│  │  Overlay  │  │                  │  │                │  │
│  └───────────┘  │                  │  └────────────────┘  │
└─────────────────┘                  └──────────────────────┘

System Requirements

Computer 2 (GPU PC)

GPU: NVIDIA RTX 20xx+ with 8GB+ VRAM (tested on RTX 5090)
OS: Windows 10/11, Linux, or macOS with Apple Silicon (M1/M2/M3/M4)
Python: 3.10 or 3.11
CUDA: 12.1+ drivers (Windows/Linux with NVIDIA GPU)
RAM: 16GB+ recommended
Storage: 5GB for models (can use smaller models on Mac)

Computer 1 (OBS PC)

OBS Studio 28+
Modern browser (Chrome/Edge/Firefox)

Setup

Quick Start for Mac Users

Running on macOS with Apple Silicon? See MAC_SETUP.md for Mac-specific instructions.

# Quick start for Mac:
chmod +x start.sh
./start.sh

# Switch to a smaller model for better performance:
python3 set_model.py small

On Computer 2 (GPU PC)

Clone and install dependencies

git clone https://github.com/arvindjuneja/subai.git SubAI
cd SubAI
python -m venv .venv
.venv\Scripts\Activate.ps1  # Windows PowerShell
# or: source .venv/bin/activate  # Linux

pip install -e ./backend
pip install torch --index-url https://download.pytorch.org/whl/cu121

Pre-download Whisper models (optional but recommended, ~3GB)
```
python download_models.py
```
Start the backend server
```
uvicorn app.main:app --app-dir backend --host 0.0.0.0 --port 8080
```
Server will be accessible at http://<GPU_PC_IP>:8080
Start the control UI (optional, for config management)
```
cd frontend
npm install
npm run dev
```
UI accessible at http://localhost:3000

Allow firewall (Windows)

New-NetFirewallRule -DisplayName "SubAI 8080" -Direction Inbound -Protocol TCP -LocalPort 8080 -Action Allow

On Computer 1 (OBS PC)

Option A: Use Browser Flags (Quick Test)

Close your browser and launch with flags:

# Chrome
"C:\Program Files\Google\Chrome\Application\chrome.exe" --unsafely-treat-insecure-origin-as-secure=http://<GPU_PC_IP>:8080 --user-data-dir="C:\temp\chrome-subai"

# Edge
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --unsafely-treat-insecure-origin-as-secure=http://<GPU_PC_IP>:8080 --user-data-dir="C:\temp\edge-subai"

Then open: http://<GPU_PC_IP>:8080/static/sender.html

Option B: Serve Locally (Secure Context)

Download sender.html from http://<GPU_PC_IP>:8080/static/sender.html

Serve locally:

# Python
python -m http.server 9000

# Node
npx serve -p 9000 .

Open: http://localhost:9000/sender.html?ws=ws://<GPU_PC_IP>:8080/ws/audio

OBS Overlay Setup

In OBS: Sources → Add → Browser Source
URL: http://<GPU_PC_IP>:8080/overlay
Width: 1920, Height: 1080
✅ Enable Shutdown source when not visible
✅ Enable Refresh browser when scene becomes active

Production Deployment (Non-Technical Users)

For video production environments with non-technical users, SubAI includes easy-to-use launchers:

Option 1: One-Click Installer (Recommended)

Run the installer (one-time setup):
```
INSTALLER.bat
```
This automatically:
- Creates Python environment
- Installs all dependencies
- Downloads AI models
- Creates desktop shortcut
Users just double-click:
```
SubAI_Start.bat
```
- Starts backend + frontend automatically
- Opens browser windows
- System tray icon appears
- Ready to use!

Option 2: System Tray Application

Professional launcher with tray icon and menu:

python launcher_app.py

Features:

Runs in background (no console window)
Right-click tray menu for quick access
Auto-opens browser windows
Clean shutdown

Option 3: Standalone Executable

Build a single .exe file for distribution:

python build_launcher.py

Creates dist/SubAI.exe - no Python knowledge required for end users!

See DEPLOYMENT.md for complete production deployment guide.

Usage

Quick Start

Configure translation (via Control UI at http://localhost:3000 or REST API):

curl -X POST http://<GPU_PC_IP>:8080/api/config \
  -H "Content-Type: application/json" \
  -d '{"source_language":"pl","target_language":"en","pipeline_name":"whisper"}'

On Computer 1: Open the audio sender, select your microphone, click Connect
On Computer 1: Start OBS with the overlay configured
Speak Polish → captions appear in English after ~1-2 seconds

Session Logging

Record your transcriptions for later review. Use the Monitor tab in the Control UI to start/stop recording sessions, or use the API directly:

# Start a logging session
curl -X POST http://<GPU_PC_IP>:8080/api/session/start

# Check session status
curl http://<GPU_PC_IP>:8080/api/session/status

# Stop the session
curl -X POST http://<GPU_PC_IP>:8080/api/session/stop

Control UI: Open the React control panel at http://localhost:3000, go to the Monitor tab, and use the "Start Recording Session" button.

Session logs are saved as JSON files in the sessions/ directory with format:

{
  "session_id": "abc123",
  "start_time": "2025-10-22T14:30:15Z",
  "end_time": "2025-10-22T15:45:20Z",
  "config": {
    "source_language": "pl",
    "target_language": "en",
    "pipeline": "whisper",
    "model": "large-v3"
  },
  "entries": [
    {
      "timestamp": "2025-10-22T14:30:20Z",
      "text": "Hello everyone",
      "processing_time_ms": 234
    }
  ]
}

API Endpoints

GET /health - Health check
GET /api/config - Get current config
POST /api/config - Update config (JSON body: {source_language, target_language, pipeline_name})
GET /api/pipelines - List available pipelines
POST /api/session/start - Start logging session
POST /api/session/stop - Stop logging session
GET /api/session/status - Get session status
WS /ws/audio - Audio ingest (PCM16LE mono 48kHz)
WS /ws/captions - Caption broadcast (JSON: {caption: string})
GET /overlay - OBS overlay HTML page
GET /static/sender.html - Browser-based audio sender

Switching Models

SubAI supports multiple Whisper models with different speed/quality tradeoffs:

Using the Helper Script (Easiest):

# List available models
python set_model.py list

# Switch to a specific model
python set_model.py small    # Fast, good for Mac/testing
python set_model.py medium   # Balanced
python set_model.py large-v3 # Best quality (default)

Via API:

curl -X POST http://<GPU_PC_IP>:8080/api/config \
  -H "Content-Type: application/json" \
  -d '{"model_name":"small"}'

Available Models:

tiny - Fastest, 40MB (testing only)
small - Fast, 245MB (recommended for Mac)
medium - Balanced, 770MB (production ready)
large-v3 - Best quality, 1.5GB (default for GPU PC)

See QUICK_REFERENCE.md for detailed model comparison.

Supported Languages

Whisper supports 99 languages. Common codes:

pl - Polish
en - English
de - German
fr - French
es - Spanish
it - Italian
ja - Japanese
zh - Chinese

For translation, set source_language and target_language to different codes.

Pipeline Comparison: Choose the Right One for You

SubAI offers three pipelines optimized for different platforms and use cases:

Feature	MLX (Apple Silicon)	Faster-Whisper (CUDA)	Faster-Whisper (CPU)
Platform	macOS (M1/M2/M3/M4)	Windows/Linux (NVIDIA GPU)	All platforms
Speed	⚡⚡⚡ Very Fast (2-4s)	⚡⚡ Fast (3-5s)	🐢 Slow (8-10s)
Quality	⭐⭐⭐ Good	⭐⭐⭐⭐⭐ Best	⭐⭐⭐⭐⭐ Best
Decoding	Greedy (no beam search)	Beam search (size=5)	Beam search (size=5)
Power Usage	🔋 Low (30-40% less)	🔥 High	🔥 Medium-High
Setup	`pip install -e './backend[mlx]'`	Standard install + CUDA	Standard install
Best For	Development, testing on Mac	Production (best quality)	Intel Macs, no GPU
Quality Note	May have occasional odd translations	Most coherent & accurate	Most coherent & accurate

🎯 Recommendation by Use Case

Production / Important Translations:

Windows/Linux with NVIDIA GPU: Use whisper (CUDA) → Best quality
Mac with Apple Silicon: Use whisper (CPU) → Best quality, slower but worth it
Intel Mac / No GPU: Use whisper (CPU) → Only option for quality

Development / Testing / Quick Iteration:

Mac with Apple Silicon: Use mlx → 3-5x faster, good enough for testing

Why Quality Differs:

Beam Search (Faster-Whisper): Explores 5 hypotheses simultaneously → better translations
Greedy Decoding (MLX): Picks single best choice at each step → faster but less coherent

Pipeline Details

`null` (Debug)

Placeholder pipeline that emits debug messages every ~1s of audio. Use for testing connectivity.

`whisper` (Production - Recommended)

GPU-accelerated Faster-Whisper pipeline with beam search:

Quality: Best (beam search with size=5)
Model: large-v3 (best quality, ~3GB VRAM/RAM)
Alternative models: medium, small, large-v2
Acceleration: CTranslate2 with CUDA (NVIDIA GPU) or CPU
Best for: Production use, Windows with NVIDIA GPU, all platforms when quality matters

`mlx` (Development - Apple Silicon Only)

MLX-accelerated Whisper pipeline with greedy decoding:

Speed: 3-5x faster than CPU on M1/M2/M3/M4 Macs
Quality: Good (greedy decoding, no beam search yet)
Power: 30-40% lower consumption vs CPU
Recommended models: large-v3-mlx, medium-mlx, small-mlx
Best for: Development and testing on Mac, quick iterations
Limitation: No beam search support yet (lower quality than faster-whisper)
See: MLX_SETUP.md for detailed setup and comparison

Quick Setup

For MLX (Mac only):

# Install MLX support
pip install -e './backend[mlx]'

# Download model
python3 download_mlx_models.py --model large-v3-mlx

# Use it
python3 set_model.py mlx large-v3-mlx

For Faster-Whisper (All platforms):

# Already included in standard install
python3 set_model.py whisper large-v3

Performance Tips

GPU not detected?
- Verify NVIDIA drivers: nvidia-smi
- Check CUDA version matches PyTorch
- The pipeline will fall back to CPU (slower but functional)
High latency?
- Use medium model instead of large-v3 for 2x speed
- Reduce chunk size in backend/app/pipeline/whisper_pipeline.py
Model downloads slow?
- Pre-download with python download_models.py
- Models cached in ~/.cache/huggingface/
OBS overlay not updating?
- Check firewall on GPU PC
- Verify overlay URL in browser first
- Restart OBS browser source (right-click → Refresh)
Microphone access blocked?
- Use HTTPS or localhost for sender
- Or launch browser with --unsafely-treat-insecure-origin-as-secure flag

Troubleshooting

"Failed to load config"

Backend not running or firewall blocking port 8080
Check: curl http://<GPU_PC_IP>:8080/health

"No captions appearing"

Check sender is connected (green status)
Verify pipeline is whisper not null
Check backend logs for errors

"Out of memory" on GPU

Switch to medium model
Close other GPU applications
Fall back to CPU: edit whisper_pipeline.py device to "cpu"

Development

File Structure

SubAI/
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI app
│   │   ├── config.py          # Runtime config
│   │   ├── state.py           # WebSocket clients manager
│   │   ├── routers/
│   │   │   ├── ingest.py      # Audio WS endpoint
│   │   │   ├── overlay.py     # Overlay page
│   │   │   └── control.py     # Config API
│   │   ├── pipeline/
│   │   │   ├── base.py        # Pipeline interface
│   │   │   ├── null_pipeline.py
│   │   │   ├── whisper_pipeline.py
│   │   │   └── registry.py    # Pipeline registry
│   │   └── static/
│   │       └── sender.html    # Browser audio sender
│   └── pyproject.toml
├── frontend/
│   ├── src/
│   │   ├── App.tsx            # Control UI
│   │   └── App.css
│   ├── package.json
│   └── vite.config.ts
├── test_gpu.py                # GPU verification script
├── download_models.py         # Model pre-download
└── README.md

Adding a New Pipeline

Create backend/app/pipeline/my_pipeline.py:

from .base import TranslationPipeline

class MyPipeline(TranslationPipeline):
    def name(self) -> str:
        return "my_pipeline"
    
    def available_models(self):
        return ["model1", "model2"]
    
    def select_model(self, model_name: str):
        pass
    
    def reset(self):
        pass
    
    def process_chunk(self, pcm16le: bytes, sample_rate_hz: int):
        # Process audio, return caption or None
        return "translated text"

Register in backend/app/pipeline/registry.py:

from .my_pipeline import MyPipeline

_pipelines["my_pipeline"] = MyPipeline()

Restart backend

Known Limitations

Requires two computers (or VM setup)
GPU highly recommended for real-time performance
WebSocket connections require secure context or browser flags for microphone access
Translation quality depends on Whisper model size (larger = better but slower)

Roadmap

MLX support for Apple Silicon - 3-5x faster on M-series Macs ✅
Single-PC mode with virtual audio routing
Custom model fine-tuning support
Real-time voice cloning/dubbing
Multi-speaker detection
Cloud deployment option
Whisper Turbo integration for even faster processing

📚 Documentation

SubAI includes comprehensive documentation for all users:

For End Users

README_USER.txt - Simple guide for non-technical users (5-minute setup)
MODEL_SWITCHING_GUIDE.txt - How to choose and switch models
QUICK_REFERENCE.md - Command cheat sheet and quick lookup

For Administrators

DEPLOYMENT.md - Production deployment guide
INSTALLER.bat - Windows one-click installer
build_launcher.py - Build standalone .exe

For Mac Users

MAC_SETUP.md - Complete Mac setup and optimization guide
MLX_SETUP.md - MLX acceleration guide for Apple Silicon (3-5x faster!)
start.sh - Mac/Linux startup script
download_mlx_models.py - Download MLX-optimized models

For Developers

FILES_GUIDE.md - Explains all files in the project
README.md - This file (technical overview)
API Documentation - See API Endpoints section

Helper Scripts

set_model.py - Easy model switching: python set_model.py small
download_models.py - Pre-download Whisper models
launcher_app.py - System tray launcher application

Support

If you encounter any issues or have questions:

Check the Troubleshooting section
Read the QUICK_REFERENCE.md for common tasks
Review MAC_SETUP.md for Mac-specific issues
Open an issue
Review existing issues for solutions

License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

This project is licensed under CC BY-NC-SA 4.0 - see the LICENSE file for details.

You are free to:

✅ Use for personal, educational, and non-profit purposes
✅ Modify and adapt the code
✅ Share and redistribute

Under these conditions:

📝 Must give appropriate credit
🚫 No commercial use without permission
🔄 Must share modifications under the same license
🔓 Must keep derivatives open source

Commercial Use: If you're interested in using SubAI commercially, please contact the maintainers to discuss licensing options.

Credits

Built with:

Faster-Whisper - CUDA-accelerated Whisper implementation
MLX Whisper - Apple Silicon optimized Whisper
CTranslate2 - Fast neural machine translation inference
FastAPI - Modern Python web framework
React - UI control panel
Vite - Frontend build tool

Made with ❤️ for the streaming community

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
backend		backend
docs		docs
frontend		frontend
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
INSTALLER.bat		INSTALLER.bat
LICENSE		LICENSE
README.md		README.md
SubAI_Start.bat		SubAI_Start.bat
build_launcher.py		build_launcher.py
download_mlx_models.py		download_mlx_models.py
download_models.py		download_models.py
launcher_app.py		launcher_app.py
launcher_requirements.txt		launcher_requirements.txt
set_model.py		set_model.py
setup_mlx.sh		setup_mlx.sh
start.ps1		start.ps1
start.sh		start.sh

License

arvindjuneja/subai

Folders and files

Latest commit

History

Repository files navigation

SubAI - Realtime Audio Translation for OBS

🚀 Quick Start

Windows (Production Ready)

Mac (Apple Silicon)

Then

📑 Table of Contents

Features

Core Translation

User Experience

Professional Features

Developer Friendly

Architecture

System Requirements

Computer 2 (GPU PC)

Computer 1 (OBS PC)

Setup

Quick Start for Mac Users

On Computer 2 (GPU PC)

On Computer 1 (OBS PC)

Option A: Use Browser Flags (Quick Test)

Option B: Serve Locally (Secure Context)

OBS Overlay Setup

Production Deployment (Non-Technical Users)

Option 1: One-Click Installer (Recommended)

Option 2: System Tray Application

Option 3: Standalone Executable

Usage

Quick Start

Session Logging

API Endpoints

Switching Models

Supported Languages

Pipeline Comparison: Choose the Right One for You

🎯 Recommendation by Use Case

Pipeline Details

null (Debug)

whisper (Production - Recommended)

mlx (Development - Apple Silicon Only)

Quick Setup

For MLX (Mac only):

For Faster-Whisper (All platforms):

Performance Tips

Troubleshooting

"Failed to load config"

"No captions appearing"

"Out of memory" on GPU

Development

File Structure

Adding a New Pipeline

Known Limitations

Roadmap

📚 Documentation

For End Users

For Administrators

For Mac Users

For Developers

Helper Scripts

Support

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`null` (Debug)

`whisper` (Production - Recommended)

`mlx` (Development - Apple Silicon Only)

Packages