Learn accents from YouTube videos by extracting word pronunciations from native speakers.
- Web Interface - Easy-to-use form with real-time progress tracking
- Downloads audio and transcripts from YouTube videos
- Extracts common words with their pronunciations
- Phoneme mode - Organize by IPA phonetic sounds
- Creates an interactive web interface to practice accent
- Configurable audio context (6s before, 2s after by default)
- Search and filter words or phonemes
- Play audio clips of each word
- Phoneme coverage tracking
- Install dependencies:
pip install -r requirements.txt- (Optional) Test that everything is set up correctly:
python test_app.py- Start the web app:
python app.py-
Your browser will automatically open to the app (if it doesn't, use the displayed URL)
-
Enter a YouTube URL and configure settings
-
Click "Generate Accent Lesson" and wait for processing
-
Practice with your personalized accent tutor!
- Install Python dependencies:
pip install -r requirements.txt- Install ffmpeg (required for audio processing):
macOS:
brew install ffmpegUbuntu/Debian:
sudo apt-get install ffmpegWindows: Download from https://ffmpeg.org/download.html
Start the web application:
python app.pyThe app will automatically:
- Find an available port (starting from 5000)
- Start the server
- Open your default browser to the app URL
Use the interactive form to:
- Enter YouTube URL
- Configure number of words
- Set audio context timing
- Enable phoneme mode
- Track processing progress in real-time
Basic usage:
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"Options:
python main.py [URL] [OPTIONS]
Options:
-o, --output DIR Output directory (default: output)
-n, --num-words N Number of words to extract (default: 100)
--common-words Filter by most common English words
--phoneme-mode Organize by phonetic sounds (IPA)
--context-before N Seconds of context before word (default: 6.0)
--context-after N Seconds of context after word (default: 2.0)Extract 100 most frequent words from a video:
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -n 100Extract only common English words:
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --common-wordsCustom output directory:
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -o my_accent_lessonLearn by phonetic sounds (recommended for accent training):
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --phoneme-modeCustomize context duration (e.g., 10 seconds before, 3 seconds after):
python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --context-before 10 --context-after 3The tool creates:
output/index.html- Interactive web interfaceoutput/audio/- Downloaded video audiooutput/clips/- Individual word audio clips
Open output/index.html in your browser to use the accent tutor!
- Download: Uses yt-dlp to download the video audio and transcript
- Process: Extracts words and their timestamps from the transcript
- Align: Uses Whisper AI if needed for better word-level timestamps
- Extract: Clips out individual word pronunciations
- Generate: Creates an interactive webpage to practice
- Choose videos with clear speech for best results
- Videos with subtitles/captions work better
- Try different accents: British English, Australian, etc.
- Use
--common-wordsto focus on frequently used words - Use
--phoneme-modeto learn specific sounds (vowels, consonants, diphthongs) - Phoneme mode shows IPA (International Phonetic Alphabet) transcriptions
- Audio clips include 6 seconds of context before the word by default (helps with natural intonation and rhythm)
- Adjust
--context-beforeand--context-afterto control clip length
Port confusion / Multiple instances running: If you see messages about different ports or the app won't start, you may have old instances running.
Stop all instances:
./stop_app.sh
# or manually:
pkill -f "python.*app.py""No transcript available": Not all videos have transcripts. The tool will use Whisper AI to transcribe (slower but works).
ffmpeg errors: Make sure ffmpeg is installed and in your PATH.
Slow processing: First run downloads Whisper AI models. Subsequent runs are faster.
- ✅
Phoneme-based extraction (IPA sounds)- NOW AVAILABLE! - Multiple video comparison
- Custom word lists
- Export to Anki flashcards
- Pronunciation scoring
- Side-by-side accent comparison
MIT License - feel free to modify and share!