Transform audio into beautifully structured insights with lightning-fast precision.
Insightron is a next-generation transcription application powered by faster-whisper (CTranslate2), featuring a stunning dark-themed GUI, batch processing capabilities, and seamless Obsidian integration. Experience up to 6x faster transcription with Distil-Whisper models, instant model reuse, and optimized realtime performance.
- β‘ faster-whisper Engine: Up to 4x faster transcription using CTranslate2 optimization
- π₯ Distil-Whisper Support: Up to 6x faster inference with
distil-medium.enanddistil-large-v2 - π Instant Model Reuse: Zero-delay start for subsequent transcriptions
- π§ Lower Memory Usage: INT8 quantization for efficient CPU processing
- π― GPU Acceleration: Automatic CUDA detection for maximum speed
- π Real-time Progress: Segment-level progress updates for smooth UX
- πΎ Smart File Operations: Atomic writes prevent data corruption
- π§ Cross-Platform: Seamless Windows, macOS, and Linux support
- π΄ Realtime Transcription: Low-latency live audio capture with automatic Obsidian note saving
- π‘οΈ Robust Error Handling: Intelligent retry mechanism with automatic parameter adjustment for difficult audio
- π§ Adaptive VAD: Dynamic voice activity detection that adapts to changing background noise levels
- β¨ Adaptive Segment Merging: Machine-learned gap thresholds that adapt to speaker cadence and natural pauses
- π Enhanced Quality Metrics: Weighted confidence scoring with degradation detection and quality tiers
- π Batch Resume & Recovery: Resume failed batches from where they left off with state persistence
- π Event-Driven Progress: Milestone-based progress tracking with segment-level events
- πΎ Memory Monitoring: Real-time memory tracking to prevent OOM conditions during batch processing
- Pure Black Background: Material Dark theme perfect for OLED screens
- Premium Color Palette:
- π΅ Bright Blue for Model selection
- π£ Purple for Language selection
- π’ Emerald for Formatting options
- Tabbed Interface: Dedicated tabs for Single File, Batch Mode, and Realtime
- Settings Persistence: Your preferences automatically saved
- Compact Timestamped Logs: Terminal-style output with
[HH:MM:SS]timestamps - Smooth Hover Effects: Premium animations throughout the UI
- Universal Format Support: MP3, WAV, M4A, FLAC, MP4, OGG, AAC, WMA
- Smart Format Detection: Automatic audio format recognition
- Quality Optimization: Model-specific parameters for best results
- File Size Validation: Automatic 500MB limit checking
- Enhanced Audio Processing: Improved librosa and soundfile integration
- 100+ Languages: Support for all Whisper-supported languages including English, Spanish, French, German, Chinese, Japanese, Arabic, Hindi, and many more
- Auto-Detection: Intelligent language detection for multilingual content
- Manual Selection: Choose specific languages for optimal accuracy
- UTF-8 Encoding: Perfect support for non-Latin scripts and special characters
- Language-Aware Processing: Optimized transcription parameters for each language
- Smart Formatting: Auto-detects paragraph breaks and sentence structure
- Filler Word Removal: Cleans up "um", "uh", and repetitive phrases
- Transcription Fixes: Corrects common Whisper AI errors
- Multiple Styles: Auto, paragraph, bullets, and minimal formatting options
- Seamless Workflow: Direct save to your Obsidian vault
- Rich Metadata: Duration, file size, language, processing time
- Timestamp Support: Optional segment-by-segment timestamps
- Tag System: Automatic tagging for easy organization
# Download and setup Insightron
git clone https://github.com/ved-3e/Insightron.git
cd Insightron
# Universal installer (recommended - works on all platforms)
python install.py
# Platform-specific installers
install_windows.bat # Windows
./install_unix.sh # Linux/macOS (chmod +x install_unix.sh first)
# Alternative installers
python setup/install_dependencies.py # Cross-platform Python installer
python setup/setup.py # Enhanced setup script
# Or manual installation
pip install -r setup/requirements.txtInsightron uses a config.yaml file for easy configuration. The file is automatically created on first run if it doesn't exist.
Edit config.yaml to set your paths and preferences:
runtime:
# Where to save transcription files
transcription_folder: "D:\\2. Areas\\Ideaverse\\Areas\\Insights"
# Where to save audio recordings
recordings_folder: "D:\\2. Areas\\Ideaverse\\Areas\\Recordings"
model:
name: "medium"
device: "auto"
compute_type: "int8"π¨ GUI Mode (Recommended):
python insightron.pyβ‘ Command Line Mode:
# Basic transcription with auto-detection
python cli.py audio.mp3
# Advanced options with language selection
python cli.py audio.wav -m large -v -f paragraphs -l es
# Multi-language examples
python cli.py spanish_audio.mp3 -l es -m medium
python cli.py french_audio.wav -l fr -f auto
python cli.py chinese_audio.m4a -l zh -m large
python cli.py arabic_audio.mp3 -l ar -v- Launch: Run
python insightron.py - Select Tab: Use "Single File" tab (default)
- Choose Audio: Click "π Choose Audio File"
- Configure Settings: Select Model (try
distil-medium.enfor speed!), Language, and Formatting - Transcribe: Click "β‘ Start Transcription"
- Monitor: Watch real-time progress in the status bar and timestamped log
- Review: Open output folder when complete
- Switch Tab: Click "Batch Mode" tab
- Select Files:
- Click "π Choose Files" to select multiple audio files
- OR click "π Choose Folder" to process an entire folder
- Process: Click "β‘ Process All Files"
- Monitor: Track progress as each file is completed in the log
- Review: Check summary statistics when finished
- Switch Tab: Click "Realtime" tab
- Configure: Select Model and Language
- Start: Click "π΄ Start Recording"
- Speak: Speak into your microphone
- Visualize: See real-time audio levels and text generation
- Stop: Click "βΉοΈ Stop Recording" to save audio and transcript
# Basic usage
python cli.py audio.mp3
# With specific model
python cli.py audio.mp3 -m large
# Custom formatting
python cli.py audio.mp3 -f paragraphs
# Create bulleted lists from speech
python cli.py meeting_notes.wav -f bullets
# Batch processing (multiple files)
python cli.py audio1.mp3 audio2.mp3
python cli.py *.mp3 -b
# Batch with specific worker count
python cli.py *.wav -b -w 4
# Use process pool (better for CPU-bound tasks)
python cli.py *.mp3 -b --use-processes
# Custom output location
python cli.py audio.mp3 -o "D:\Output\transcript.md"Insightron supports 100+ languages including all major world languages:
- English (en) - Default, highest accuracy
- Spanish (es) - EspaΓ±ol
- French (fr) - FranΓ§ais
- German (de) - Deutsch
- Chinese (zh) - δΈζ (Mandarin)
- Japanese (ja) - ζ₯ζ¬θͺ
- Korean (ko) - νκ΅μ΄
- Arabic (ar) - Ψ§ΩΨΉΨ±Ψ¨ΩΨ©
- Hindi (hi) - ΰ€Ήΰ€Ώΰ€¨ΰ₯ΰ€¦ΰ₯
- Russian (ru) - Π ΡΡΡΠΊΠΈΠΉ
- Portuguese (pt) - PortuguΓͺs
- Italian (it) - Italiano
- European: Dutch, Swedish, Norwegian, Danish, Finnish, Polish, Czech, Hungarian, Romanian, Bulgarian, Croatian, Slovak, Slovenian, Estonian, Latvian, Lithuanian, Greek, Welsh, Irish, Maltese, Albanian, Basque, Catalan, Galician
- Asian: Thai, Vietnamese, Burmese, Khmer, Lao, Mongolian, Tamil, Telugu, Malayalam, Kannada, Gujarati, Punjabi, Marathi, Nepali, Sinhala, Bengali
- Middle Eastern/African: Persian, Urdu, Hebrew, Amharic, Swahili, Zulu, Afrikaans
- And many more...
# Auto-detection (recommended for most cases)
python cli.py audio.mp3 -l auto
# Specific language selection for better accuracy
python cli.py spanish_meeting.mp3 -l es -m medium
python cli.py french_podcast.wav -l fr -m large
python cli.py chinese_lecture.m4a -l zh -v
python cli.py arabic_news.mp3 -l ar -f paragraphs
python cli.py hindi_interview.wav -l hi -m medium- Auto-detection: Use
autoor leave blank for most cases - Whisper is very good at detecting languages - Manual selection: Specify language for better accuracy, especially with:
- Background noise or poor audio quality
- Mixed-language content where you want to prioritize one language
- Less common languages where auto-detection might be uncertain
- UTF-8 Support: All non-Latin scripts (Chinese, Arabic, Hindi, etc.) are fully supported with proper UTF-8 encoding
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | ~39 MB | β‘β‘β‘ | ββ | Quick drafts, testing |
| base | ~74 MB | β‘β‘ | βββ | Balanced performance |
| small | ~244 MB | β‘ | ββββ | High quality, good speed |
| medium | ~769 MB | β‘ | βββββ | Recommended |
| large-v2 | ~1550 MB | β‘ | βββββ | Maximum accuracy |
| distil-medium.en | ~394 MB | β‘β‘β‘β‘ | βββββ | Best Speed/Accuracy (English only) |
| distil-large-v2 | ~756 MB | β‘β‘β‘ | βββββ | High accuracy, faster than large |
- RAM: 4GB+ recommended for medium model, 8GB+ for large
- Storage: ~2GB for all models combined
- CPU: Multi-core processor recommended
- GPU: CUDA support available for 3-5x speedup
Transcripts are saved as beautifully formatted Markdown files with rich metadata:
---
title: my_audio_file
date: 2024-01-15 14:30:25
duration: 5:23
duration_seconds: 323.4
file_size: 12.5 MB
model: medium
language: en
formatting: auto
tags: [transcription, audio-note, whisper]
created: 2024-01-15 14:30:25
---
# π€ Transcription: my_audio_file
## π Metadata
- **Duration:** 5:23 (323.4 seconds)
- **File Size:** 12.5 MB
- **Model:** medium
- **Language:** en
- **Formatting:** Auto
- **Transcribed:** 2024-01-15 14:30:25
## π Transcript
Your beautifully formatted transcript here with intelligent paragraph breaks...
## π Timestamps
**00:00 - 00:15:** First segment of speech
**00:15 - 00:30:** Second segment of speech
...
---
*Transcribed using Insightron*
*Generated on 2024-01-15 14:30:25*# Enhanced troubleshooting
python scripts/troubleshoot.py
# Cross-platform installer
python install_dependencies.py
# Windows-specific installer
install_windows.bat
# Manual dependency installation
pip install --upgrade pip
pip install -r requirements-minimal.txt- File too large: Whisper has a 500MB limit (increased from 25MB) - use audio compression for larger files
- Unsupported format: Insightron supports MP3, WAV, M4A, FLAC, MP4, OGG, AAC, WMA
- Corrupted file: Try a different audio file or run diagnostics
- Slow transcription: Use
distil-medium.enor smaller models (tiny, base, small) - Memory errors: Close other applications, use tiny model
- GPU not detected: Install CUDA toolkit for GPU acceleration
- Path not found: Update
transcription_folderinconfig.yaml - Permission denied: Run as administrator or check folder permissions
- Files not appearing: Refresh Obsidian or check the correct folder
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.10+ | 3.11+ |
def create_markdown(filename, text, date, duration, model, ...):
# Your custom template here
return custom_templateUpdate gui.py to modify colors and styling:
self.colors = {
'accent': '#your_color', # Primary accent color
'bg_primary': '#your_bg', # Background color
'text_primary': '#your_text', # Text color
# ... more colors
}- Use
distil-medium.enfor lightning-fast English transcription - Use
tinyorbasemodels for faster multi-language transcription - Enable GPU acceleration with CUDA
- Close unnecessary applications during transcription
- Use SSD storage for better I/O performance
- Process shorter audio files
- Use smaller models for large files
- Enable memory-efficient processing in config
- Use
mediumorlargemodels for better accuracy - Ensure good audio quality (clear speech, minimal background noise)
- Use appropriate formatting style for your content type
Insightron includes a built-in benchmarking tool to test performance on your system.
# Run standard benchmark
python benchmark_insightron.py
# The benchmark will:
# 1. Generate a test audio file
# 2. Run single-file transcription
# 3. Run batch transcription
# 4. Test realtime simulation
# 5. Save results to benchmark_results.jsonWe welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and test thoroughly
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
# Clone your fork
git clone https://github.com/ved-3e/Insightron.git
cd Insightron
# Install development dependencies
pip install -r requirements.txt
pip install pytest black flake8
# Run enhanced diagnostics
python troubleshoot.py
# Run tests
python -m pytest
# Format code
black *.py- OpenAI for the incredible Whisper AI model
- The open-source community for audio processing libraries
- Obsidian for the excellent note-taking platform
- Contributors who help improve Insightron
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
Happy Transcribing! π€β¨
Transform audio into structured wisdom β locally, beautifully, intelligently.
Insightron v2.2.0 - Enterprise-Grade Intelligence
- β Adaptive Segment Merging: Machine-learned gap thresholds that adapt to speaker cadence (fast/slow/normal speech patterns)
- β Enhanced Quality Metrics: Weighted confidence scoring, percentile analysis, and quality degradation detection
- β Batch Resume & Recovery: Resume failed batches from where they left off with JSON state persistence
- β Event-Driven Progress: Milestone-based progress tracking (25%, 50%, 75%, 100%) with segment-level events
- β Memory Monitoring: Real-time memory tracking with OOM prevention for large batch operations
- β Code Quality: Reduced duplication by centralizing quality metrics calculation
- β Bullets Formatting: New "bullets" formatting style to automatically create bulleted lists from speech.
- β Smart Segment Merging: Intelligent algorithm to merge fragmented speech segments based on confidence and timing.
- β Adaptive VAD: Dynamic Voice Activity Detection that adjusts thresholds based on audio characteristics.
- β Retry Mechanism: Robust error handling that automatically retries failed transcriptions with fallback parameters.
- β Quality Metrics: Detailed quality reporting including confidence scores and low-confidence segment counts.
- β Instant Model Reuse: Implemented smart caching to eliminate model loading delays (10-20s saved per file)
- β
Distil-Whisper Support: Added
distil-medium.enanddistil-large-v2for 6x faster transcription - β
Optimized Realtime: New
deque-based buffering for lower latency and CPU usage - β Enhanced Batch Mode: Shared model instance across batch files for maximum throughput
- β Smart Beam Search: Dynamic beam size adjustment (1 for speed, 5 for accuracy)
- β Live Audio Capture: Record and transcribe microphone input in real-time
- β Instant Feedback: See text appear as you speak
- β Audio Visualization: Dynamic audio level meter
- β Dual Saving: Saves both audio recording (WAV) and transcription (MD)
- β Obsidian Integration: Auto-saves directly to your vault
- β
faster-whisper Integration: Migrated from
openai-whispertofaster-whisper(CTranslate2) - β 4x Speed Boost: Up to 4x faster transcription on both CPU and GPU
- β Lower Memory Usage: Significantly reduced RAM consumption
- β INT8 Quantization: Optimized for CPU with minimal accuracy loss
- β GPU Auto-Detection: Automatic CUDA acceleration when available
- β Real-time Progress: Segment-level progress tracking for smooth UX
- β
Pure Black Background: Material Dark theme (
#000000) perfect for OLED - β Premium Color Palette: Bright Blue, Purple, and Emerald accents
- β Polished Cards: Subtle borders and improved spacing
- β Enhanced Typography: Larger icons and better font hierarchy
- β Smooth Animations: Premium hover effects throughout
- β Batch Processing: Dedicated tab for multi-file processing
- β Settings Persistence: Preferences saved automatically
- β Compact Logs: Cleaner terminal output
- β 100+ Languages: Support for all Whisper-supported languages
- β UTF-8 Encoding: Perfect support for non-Latin scripts