Skip to content

ShriSamson/AccentTutor

Repository files navigation

Accent Tutor

Learn accents from YouTube videos by extracting word pronunciations from native speakers.

Features

  • Web Interface - Easy-to-use form with real-time progress tracking
  • Downloads audio and transcripts from YouTube videos
  • Extracts common words with their pronunciations
  • Phoneme mode - Organize by IPA phonetic sounds
  • Creates an interactive web interface to practice accent
  • Configurable audio context (6s before, 2s after by default)
  • Search and filter words or phonemes
  • Play audio clips of each word
  • Phoneme coverage tracking

Quick Start

  1. Install dependencies:
pip install -r requirements.txt
  1. (Optional) Test that everything is set up correctly:
python test_app.py
  1. Start the web app:
python app.py
  1. Your browser will automatically open to the app (if it doesn't, use the displayed URL)

  2. Enter a YouTube URL and configure settings

  3. Click "Generate Accent Lesson" and wait for processing

  4. Practice with your personalized accent tutor!

Installation

  1. Install Python dependencies:
pip install -r requirements.txt
  1. Install ffmpeg (required for audio processing):

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Usage

Web Interface (Recommended)

Start the web application:

python app.py

The app will automatically:

  • Find an available port (starting from 5000)
  • Start the server
  • Open your default browser to the app URL

Use the interactive form to:

  • Enter YouTube URL
  • Configure number of words
  • Set audio context timing
  • Enable phoneme mode
  • Track processing progress in real-time

Command Line Interface

Basic usage:

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

Options:

python main.py [URL] [OPTIONS]

Options:
  -o, --output DIR          Output directory (default: output)
  -n, --num-words N         Number of words to extract (default: 100)
  --common-words            Filter by most common English words
  --phoneme-mode            Organize by phonetic sounds (IPA)
  --context-before N        Seconds of context before word (default: 6.0)
  --context-after N         Seconds of context after word (default: 2.0)

Examples

Extract 100 most frequent words from a video:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -n 100

Extract only common English words:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --common-words

Custom output directory:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -o my_accent_lesson

Learn by phonetic sounds (recommended for accent training):

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --phoneme-mode

Customize context duration (e.g., 10 seconds before, 3 seconds after):

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --context-before 10 --context-after 3

Output

The tool creates:

  • output/index.html - Interactive web interface
  • output/audio/ - Downloaded video audio
  • output/clips/ - Individual word audio clips

Open output/index.html in your browser to use the accent tutor!

How It Works

  1. Download: Uses yt-dlp to download the video audio and transcript
  2. Process: Extracts words and their timestamps from the transcript
  3. Align: Uses Whisper AI if needed for better word-level timestamps
  4. Extract: Clips out individual word pronunciations
  5. Generate: Creates an interactive webpage to practice

Tips

  • Choose videos with clear speech for best results
  • Videos with subtitles/captions work better
  • Try different accents: British English, Australian, etc.
  • Use --common-words to focus on frequently used words
  • Use --phoneme-mode to learn specific sounds (vowels, consonants, diphthongs)
  • Phoneme mode shows IPA (International Phonetic Alphabet) transcriptions
  • Audio clips include 6 seconds of context before the word by default (helps with natural intonation and rhythm)
  • Adjust --context-before and --context-after to control clip length

Troubleshooting

Port confusion / Multiple instances running: If you see messages about different ports or the app won't start, you may have old instances running.

Stop all instances:

./stop_app.sh
# or manually:
pkill -f "python.*app.py"

"No transcript available": Not all videos have transcripts. The tool will use Whisper AI to transcribe (slower but works).

ffmpeg errors: Make sure ffmpeg is installed and in your PATH.

Slow processing: First run downloads Whisper AI models. Subsequent runs are faster.

Future Enhancements

  • Phoneme-based extraction (IPA sounds) - NOW AVAILABLE!
  • Multiple video comparison
  • Custom word lists
  • Export to Anki flashcards
  • Pronunciation scoring
  • Side-by-side accent comparison

License

MIT License - feel free to modify and share!

About

A web app for extracting phoneme audio clips from YouTube videos to learn accents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •