Accent Tutor

Learn accents from YouTube videos by extracting word pronunciations from native speakers.

Features

Web Interface - Easy-to-use form with real-time progress tracking
Downloads audio and transcripts from YouTube videos
Extracts common words with their pronunciations
Phoneme mode - Organize by IPA phonetic sounds
Creates an interactive web interface to practice accent
Configurable audio context (6s before, 2s after by default)
Search and filter words or phonemes
Play audio clips of each word
Phoneme coverage tracking

Quick Start

Install dependencies:

pip install -r requirements.txt

(Optional) Test that everything is set up correctly:

python test_app.py

Start the web app:

python app.py

Your browser will automatically open to the app (if it doesn't, use the displayed URL)
Enter a YouTube URL and configure settings
Click "Generate Accent Lesson" and wait for processing
Practice with your personalized accent tutor!

Installation

Install Python dependencies:

pip install -r requirements.txt

Install ffmpeg (required for audio processing):

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt-get install ffmpeg

Windows: Download from https://ffmpeg.org/download.html

Usage

Web Interface (Recommended)

Start the web application:

python app.py

The app will automatically:

Find an available port (starting from 5000)
Start the server
Open your default browser to the app URL

Use the interactive form to:

Enter YouTube URL
Configure number of words
Set audio context timing
Enable phoneme mode
Track processing progress in real-time

Command Line Interface

Basic usage:

python main.py "https://www.youtube.com/watch?v=VIDEO_ID"

Options:

python main.py [URL] [OPTIONS]

Options:
  -o, --output DIR          Output directory (default: output)
  -n, --num-words N         Number of words to extract (default: 100)
  --common-words            Filter by most common English words
  --phoneme-mode            Organize by phonetic sounds (IPA)
  --context-before N        Seconds of context before word (default: 6.0)
  --context-after N         Seconds of context after word (default: 2.0)

Examples

Extract 100 most frequent words from a video:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -n 100

Extract only common English words:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --common-words

Custom output directory:

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" -o my_accent_lesson

Learn by phonetic sounds (recommended for accent training):

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --phoneme-mode

Customize context duration (e.g., 10 seconds before, 3 seconds after):

python main.py "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --context-before 10 --context-after 3

Output

The tool creates:

output/index.html - Interactive web interface
output/audio/ - Downloaded video audio
output/clips/ - Individual word audio clips

Open output/index.html in your browser to use the accent tutor!

How It Works

Download: Uses yt-dlp to download the video audio and transcript
Process: Extracts words and their timestamps from the transcript
Align: Uses Whisper AI if needed for better word-level timestamps
Extract: Clips out individual word pronunciations
Generate: Creates an interactive webpage to practice

Tips

Choose videos with clear speech for best results
Videos with subtitles/captions work better
Try different accents: British English, Australian, etc.
Use --common-words to focus on frequently used words
Use --phoneme-mode to learn specific sounds (vowels, consonants, diphthongs)
Phoneme mode shows IPA (International Phonetic Alphabet) transcriptions
Audio clips include 6 seconds of context before the word by default (helps with natural intonation and rhythm)
Adjust --context-before and --context-after to control clip length

Troubleshooting

Port confusion / Multiple instances running: If you see messages about different ports or the app won't start, you may have old instances running.

Stop all instances:

./stop_app.sh
# or manually:
pkill -f "python.*app.py"

"No transcript available": Not all videos have transcripts. The tool will use Whisper AI to transcribe (slower but works).

ffmpeg errors: Make sure ffmpeg is installed and in your PATH.

Slow processing: First run downloads Whisper AI models. Subsequent runs are faster.

Future Enhancements

✅ ~~Phoneme-based extraction (IPA sounds)~~ - NOW AVAILABLE!
Multiple video comparison
Custom word lists
Export to Anki flashcards
Pronunciation scoring
Side-by-side accent comparison

License

MIT License - feel free to modify and share!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
templates		templates
.gitignore		.gitignore
AUTOMATIC_BROWSER_OPENING.md		AUTOMATIC_BROWSER_OPENING.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
PORT_ISSUE_EXPLAINED.md		PORT_ISSUE_EXPLAINED.md
README.md		README.md
WEB_INTERFACE_GUIDE.md		WEB_INTERFACE_GUIDE.md
app.py		app.py
audio_clipper.py		audio_clipper.py
main.py		main.py
phoneme_extractor.py		phoneme_extractor.py
requirements.txt		requirements.txt
stop_app.sh		stop_app.sh
test_app.py		test_app.py
transcript_processor.py		transcript_processor.py
web_generator.py		web_generator.py
youtube_downloader.py		youtube_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Accent Tutor

Features

Quick Start

Installation

Usage

Web Interface (Recommended)

Command Line Interface

Examples

Output

How It Works

Tips

Troubleshooting

Future Enhancements

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ShriSamson/AccentTutor

Folders and files

Latest commit

History

Repository files navigation

Accent Tutor

Features

Quick Start

Installation

Usage

Web Interface (Recommended)

Command Line Interface

Examples

Output

How It Works

Tips

Troubleshooting

Future Enhancements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages