Skip to content

πŸŽ™οΈ Powerful GUI tool to transcribe and translate audio/video files using Whisper and OpenAI β€” fast, simple, and GPU-optimized.

License

Notifications You must be signed in to change notification settings

jjaruna/autoTranscriptGUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AutoTranscript GUI πŸŽ™οΈ

AutoTranscript is a powerful, GPU-accelerated subtitle generator built on top of OpenAI's Whisper model. It features both a command-line interface (CLI) and a beautiful CustomTkinter-based GUI for users who prefer a graphical workflow.

Supports:

  • Local audio/video files
  • Subtitle translation to English
  • OpenAI API (for higher quality translations)

✨ Features

  • πŸ–₯️ Full-featured GUI with progress tracking, real-time logs, and OpenAI config
  • πŸ“œ Generate .srt subtitle files from media files
  • 🌍 Supports multilingual transcription and optional translation to English
  • 🧠 Uses Faster-Whisper for fast GPU-accelerated transcription
  • πŸ” Automatic model selection based on VRAM (e.g. large-v3, medium, etc.)
  • πŸ” API key manager for OpenAI GPT models

πŸ“Έ GUI Preview

image


🧩 Requirements

  • Python 3.8+
  • ffmpeg (must be installed)
  • NVIDIA GPU with CUDA (recommended)
  • Whisper models (via Faster-Whisper)
  • PyTorch with CUDA
  • .env file for OpenAI (optional)

πŸ“¦ Installation

git clone https://github.com/jjaruna/autoTranscriptGUI.git
cd autoTranscriptGUI
pip install -r requirements.txt

πŸš€ Launch the GUI

python AutoTranscriptGUI.py

Whisper Model Comparison

Model Recommended VRAM Performance Use Case
tiny β‰₯ 1 GB Very fast, low accuracy Quick tests, very low-resource machines
base β‰₯ 2 GB Fast, low-medium accuracy Basic transcriptions, short files
small β‰₯ 4 GB Balanced speed/accuracy Good for medium-length files, better accuracy
medium β‰₯ 8 GB Slower, higher accuracy Longer files, good balance of quality and performance
large-v1 β‰₯ 10 GB High accuracy Older large model, still very capable
large-v2 β‰₯ 10 GB Improved accuracy More robust than v1, slower on limited VRAM
large-v3 β‰₯ 12 GB Latest model, high accuracy Best offline model for quality transcription
large-v3-turbo β‰₯ 12 GB Fastest large model High speed with high accuracy, better multi-language support

🧠 Recommendation

After testing the large-v3-turbo model more than 10 times, I can confidently say it is the fastest and most accurate among all Whisper models included in this app.

πŸ–₯️ My system has 4GB of VRAM, and despite being under the recommended VRAM for large models, large-v3-turbo still performed exceptionally well.

⚠️ Note: Your experience may vary depending on your GPU and available VRAM. Use this recommendation as a reference, not a guarantee. If you encounter performance issues, try smaller models like medium or small.


βš™οΈ OpenAI API Setup (Optional)

To enable OpenAI-powered translation:

  1. Click "Add API Key" in the GUI
  2. Enter your OpenAI key and model (gpt-4, gpt-3.5-turbo, etc.)
  3. It will be saved to .env file automatically

πŸ–₯️ CLI Mode (Optional)

You can still use the command-line version via autosub.py:

python autosub.py myvideo.mp4 -l ja --translate --model base

CLI Options

Option Description
filename File path
-l, --language Force language (e.g. en, es, zh)
-t, --translate Translate to English
-o, --openai Use OpenAI API
--model Whisper model to use
--debug Enable debug mode
--keep Keep intermediate WAV file

πŸ“ Output

  • Subtitles are saved as .srt files in the same folder as your media.
  • If translated, original and translated text will be preserved.

πŸ§ͺ Example GUI Workflow

  1. Open GUI
  2. Select video/audio file
  3. Choose language and Whisper model
  4. (Optional) Enable "Translate to English"
  5. (Optional) Enable "Use OpenAI"
  6. Click Start Transcription
  7. Wait for progress bar and logs to finish

πŸ™ Credits


πŸ“„ License

MIT License β€” free for personal and commercial use.

About

πŸŽ™οΈ Powerful GUI tool to transcribe and translate audio/video files using Whisper and OpenAI β€” fast, simple, and GPU-optimized.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages