A completely offline, air-gapped video-to-text transcription system with AI-powered summarization. Designed for sensitive data that requires maximum security with zero internet access.
- 🔒 Completely Air-Gapped - Runs with
--network none, zero internet access at runtime - 🌐 Web Interface - Drag-and-drop video uploads with real-time progress
- ✨ AI Summarization - Generate bullet points and paragraph summaries using Llama 2 7B
- 🚀 Dual Engines - Choose between faster-whisper (Python) or whisper.cpp (C++)
- 📦 All Models Bundled - Whisper models + Llama 2 7B (~27 GB total)
- 💾 Settings Persistence - Your preferences are remembered across sessions
- ⚙️ Performance Tuning - Adjust model size, compute type, threads, and more
- 📝 Multiple Formats - Outputs TXT, SRT, VTT, and AI-generated summaries
- 💻 Cross-Platform - Works on Mac (Intel & Apple Silicon) and Linux
- Backend: FastAPI + Uvicorn (single-worker for resource control)
- Transcription Engines:
- faster-whisper (CTranslate2, CPU-optimized)
- whisper.cpp (GGML, optimized for Apple Silicon)
- Summarization: Llama 2 7B Chat via llama.cpp (CPU-only)
- Models:
- Whisper: 5 sizes × 2 engines = 10 variants
- Llama: 1 quantized model (Q4_K_M, ~4 GB)
- Security: No network access, non-root user, read-only filesystem (except /data)
- 🎬 Demo - START HERE! Simple walkthrough for beginners
- 🚀 Quick Start - Get started in 3 steps
- 🔧 Troubleshooting - Solutions to common problems
⚠️ Edge Cases - How scripts handle edge cases automatically- 🔗 User Flow - Visual diagrams and flowcharts
- 📋 Implementation - Architecture and technical details
- ✅ Test Checklist - Comprehensive testing guide
- Docker (or Docker Desktop for Mac)
- Disk Space: ~30 GB free (final image is ~27 GB)
- RAM:
- 8 GB minimum for transcription only
- 16 GB recommended for transcription + summarization
- CPU: Multi-core recommended (4+ cores ideal)
- Time: First build takes 30-60 minutes (one-time setup)
-
Clone or navigate to the project:
cd silent-scribe -
Build the Docker image (this downloads and bundles all models):
./build
⚠️ Note: Building takes 30-60 minutes and requires ~15-20 GB disk space. This only needs to be done once. -
Start the application:
./start
-
Open the web UI:
- Navigate to http://localhost:7860 in your browser
-
Stop the application:
./stop # or just press Ctrl+C
-
Upload a Video
- Drag and drop a video file (MP4, MOV, MKV, AVI, etc.) or click to browse
- Supports audio files too (WAV, MP3, etc.)
-
Configure Settings
- Engine: faster-whisper (recommended) or whisper.cpp
- Model Size:
tiny- Fastest, lowest qualitybase- Fast, decent qualitysmall- Recommended - Good balancemedium- Slower, better qualitylarge-v3- Slowest, best quality
- Compute Type (faster-whisper only):
int8- Fastest, lower qualityint8_float16- Recommended - Good balanceint16- Higher qualityfloat32- Highest quality, slowest
- Threads: Number of CPU cores to use
- Language: Auto-detect or specify (en, es, fr, de, etc.)
-
Start Transcription
- Click "Start Transcription"
- Watch the progress bar
- When complete, view the transcript and download TXT/SRT/VTT files
-
Generate AI Summary (✨ NEW!)
- After transcription completes, click "✨ Generate Summary"
- Wait 30-90 seconds (depending on transcript length)
- View bullet points + paragraph summary in the Summary tab
- Download summary files separately
-
Stop the Container:
make stop # or ./scripts/stop.sh # or just press Ctrl+C
Silent Scribe now includes local AI-powered summarization using Llama 2 7B Chat.
- Complete a transcription first
- Click the "✨ Generate Summary" button
- Wait while the AI processes your transcript (30-90 seconds)
- Get two summary formats:
- Bullet Points: 5-10 key facts, decisions, and action items
- Paragraph Summary: 150-250 word overview
- 🔒 Completely Offline: Uses local Llama 2 model, no API calls
- 🚀 On-Demand: Only generated when you click the button
- 🧠 Smart Chunking: Handles long transcripts with map-reduce
- 💾 Persistent: Summaries are saved and reload with the page
- 🔒 Concurrent-Safe: Only one task (transcription OR summarization) runs at a time
- Short transcripts (<1000 words): 15-30 seconds
- Medium transcripts (1000-5000 words): 30-60 seconds
- Long transcripts (>5000 words): 60-120 seconds
- Uses CPU only (works on all platforms)
All your settings are now automatically saved:
- Engine preference (faster-whisper/whisper.cpp)
- Model size (tiny/base/small/medium/large-v3)
- Compute type
- Thread count
- Language selection
- Speaker detection preference
Default language changed to English instead of auto-detect.
docker run -d \
-p 7860:7860 \
-v "$(pwd)/data:/data" \
--name silent-scribe \
--network none \
silent-scribe:latestdocker run --rm -it \
-p 8080:7860 \
-v "$(pwd)/data:/data" \
--name silent-scribe \
--network none \
silent-scribe:latest# List results
ls -la data/results/
# View a transcript
cat data/results/<job-id>/transcript.txtThis application is designed for maximum security with sensitive data:
- ✅ No Network Access: Container runs with
--network none - ✅ All Models Bundled: No runtime downloads, all models pre-downloaded at build time
- ✅ Non-Root User: Application runs as unprivileged user
- ✅ Offline-First:
HF_HUB_OFFLINE=1,TRANSFORMERS_OFFLINE=1environment variables set - ✅ No External Resources: Web UI has no CDN dependencies, fonts, or external scripts
- ✅ Local Processing Only: All data stays on your machine
On Linux:
# Check network namespaces while container is running
docker inspect silent-scribe | grep NetworkMode
# Should show: "NetworkMode": "none"On Mac:
# Container should not be able to resolve DNS
docker exec silent-scribe ping -c 1 google.com
# Should fail with "network unreachable"-
faster-whisper (CTranslate2 format):
- tiny (~75 MB)
- base (~150 MB)
- small (~500 MB)
- medium (~1.5 GB)
- large-v3 (~3 GB)
-
whisper.cpp (GGML format):
- ggml-tiny.bin (~75 MB)
- ggml-base.bin (~150 MB)
- ggml-small.bin (~500 MB)
- ggml-medium.bin (~1.5 GB)
- ggml-large-v3.bin (~3 GB)
-
Llama 2 7B Chat (✨ NEW! for summarization):
- llama-2-7b-chat.Q4_K_M.gguf (~4 GB, quantized)
- CPU-optimized with OpenBLAS
- Runs via llama.cpp (same as whisper.cpp)
- TXT: Plain text transcript
- SRT: SubRip subtitle format (for video players)
- VTT: WebVTT subtitle format (for web players)
- Summary (Bullets): AI-generated key points (✨ NEW!)
- Summary (Paragraph): AI-generated overview (✨ NEW!)
Problem: Build fails with "network timeout"
# Solution: Retry the build
make buildProblem: Out of disk space
# Solution: Clean up Docker
docker system prune -aProblem: Container won't start
# Check if port is in use
lsof -i :7860
# Use a different port
docker run -p 8080:7860 ... silent-scribe:latestProblem: Transcription fails
- Check that the video file is valid (try playing it first)
- Try a smaller model (tiny or base)
- Check available RAM (
docker stats silent-scribe)
Problem: Out of memory
- Use a smaller model (tiny, base, or small)
- Reduce threads setting
- Close other applications
Problem: Apple Silicon performance
- The image includes optimizations for both Intel and ARM
- whisper.cpp is particularly fast on Apple Silicon
- faster-whisper works well but is CPU-only
Problem: Docker Desktop memory limit
- Go to Docker Desktop → Settings → Resources
- Increase Memory to at least 8 GB (16 GB recommended)
Problem: Permission denied on /data
# Fix permissions
sudo chown -R $(id -u):$(id -g) data/-
Choose the Right Model:
- For speed:
tinyorbase - For quality:
mediumorlarge-v3 - For balance:
small✅
- For speed:
-
Optimize Threads:
- Set to number of CPU cores
- Don't exceed physical cores
-
Compute Type (faster-whisper):
int8_float16is the best balance ✅int8for maximum speedfloat32for maximum quality
-
Engine Choice:
- faster-whisper: Better progress tracking, more tunable
- whisper.cpp: Faster on Apple Silicon
silent-scribe/
├── app/
│ ├── backend/
│ │ ├── main.py # FastAPI application
│ │ ├── config.py # Configuration
│ │ ├── job_manager.py # Job management
│ │ └── engines/
│ │ ├── faster_whisper_engine.py
│ │ └── whisper_cpp_engine.py
│ └── templates/
│ └── index.html # Web UI
├── docker/
│ └── Dockerfile # Multi-stage build
├── scripts/
│ ├── build.sh # Build script
│ ├── run.sh # Run script
│ ├── stop.sh # Stop script
│ └── fetch_models.py # Model fetcher (build-time)
├── data/ # Mounted volume
│ ├── uploads/ # Uploaded videos
│ └── results/ # Transcription results
├── requirements.txt # Python dependencies
├── Makefile # Make shortcuts
└── README.md # This file
This is a self-contained, air-gapped application. Modifications should maintain:
- Zero runtime network access
- All dependencies bundled at build time
- Security-first design
This project uses the following open-source components:
- faster-whisper (MIT License)
- whisper.cpp (MIT License)
- llama.cpp (MIT License)
- FastAPI (MIT License)
- OpenAI Whisper models (MIT License)
- Llama 2 (Meta's Llama 2 Community License Agreement)
Note on Llama 2 Usage: This project uses Llama 2 for offline summarization. The model is downloaded at build time and runs completely locally. Usage complies with Meta's license for internal tooling and offline applications.
Built on top of excellent open-source projects:
- faster-whisper by Guillaume Klein
- whisper.cpp by Georgi Gerganov
- llama.cpp by Georgi Gerganov
- OpenAI Whisper by OpenAI
- Llama 2 by Meta AI
Silent Scribe - Transcribing in silence, secure and offline. 🤫🔒