A standalone audio transcription library using OpenAI's Whisper model. This library provides both a Node.js API and a command-line interface for transcribing audio files.
Main interface with drag-and-drop file selection
Real-time transcription progress with status updates
Results display with save and copy options
Transcription settings and configuration panel
Whisper model selection with size and accuracy details
- π΅ Multiple Audio Formats: Supports WebM, WAV, MP3, M4A, MP4, OGG, FLAC, AAC
- β±οΈ Optional Timestamps: Include timestamps in your transcripts
- π― Multiple Whisper Models: Choose from tiny, base, small, medium, large
- Tiny: Fastest, least accurate (39M parameters)
- Base: Fast, good accuracy (74M parameters)
- Small: Balanced speed/accuracy (244M parameters)
- Medium: Recommended (769M parameters)
- Large: Slowest, most accurate (1550M parameters)
- π§ Flexible Output Formats: TXT, SRT, JSON
- π‘οΈ Error Handling: Graceful fallback to mock mode if Whisper isn't available
- π§Ή Auto Cleanup: Automatically cleans up temporary files
- π± Cross-Platform: Works on macOS, Linux, and Windows
-
FFmpeg: Required for audio conversion
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html
-
Whisper (Optional): For actual transcription
# Install Whisper pip install openai-whisper # Or use a virtual environment python -m venv whisper-env source whisper-env/bin/activate # On Windows: whisper-env\Scripts\activate pip install openai-whisper
# Clone the repository
git clone https://github.com/yourusername/audio-transcriber.git
cd audio-transcriber
# Install dependencies
npm install
# Make CLI globally available (optional)
npm link# Transcribe an audio file
transcribe audio.mp3
# With timestamps
transcribe audio.m4a --timestamps
# Specify output format
transcribe audio.wav --format srt
# Use a specific Whisper model
transcribe audio.webm --model large# Full command with all options
transcribe "My Audio File.m4a" \
--timestamps \
--model medium \
--format txt \
--output transcript.txt \
--whisper-path /path/to/whisper| Option | Short | Description |
|---|---|---|
--timestamps |
-t |
Include timestamps in the transcript |
--format <format> |
-f |
Output format: txt, srt, json (default: txt) |
--model <model> |
-m |
Whisper model: tiny, base, small, medium, large (default: medium) |
--whisper-path <path> |
-w |
Custom path to Whisper executable |
--output <file> |
-o |
Output file path (default: auto-generated) |
--help |
-h |
Show help message |
const AudioTranscriber = require('audio-transcriber');
async function transcribeAudio() {
// Initialize transcriber
const transcriber = new AudioTranscriber({
whisperModel: 'medium', // Optional: 'tiny', 'base', 'small', 'medium', 'large'
whisperPath: '/path/to/whisper', // Optional: custom Whisper path
tmpDir: './tmp', // Optional: temporary directory
cleanupTempFiles: true // Optional: auto-cleanup (default: true)
});
try {
// Transcribe with timestamps
const result = await transcriber.transcribe('audio.mp3', {
includeTimestamps: true,
outputFormat: 'txt'
});
console.log('Transcript:', result.text);
console.log('File info:', result.file_info);
} catch (error) {
console.error('Transcription failed:', error.message);
}
}
transcribeAudio();Creates a new AudioTranscriber instance.
Options:
whisperModel(string): Whisper model size ('tiny', 'base', 'small', 'medium', 'large')whisperPath(string): Custom path to Whisper executabletmpDir(string): Directory for temporary filescleanupTempFiles(boolean): Whether to auto-cleanup temp files
Transcribes an audio file.
Parameters:
inputFilePath(string): Path to the audio fileoptions(object):includeTimestamps(boolean): Include timestamps in outputoutputFormat(string): Output format ('txt', 'srt', 'json')
Returns:
{
text: string, // The transcribed text
file_info: {
filename: string, // Original filename
size: number, // File size in bytes
content_type: string, // MIME type
size_kb: string, // File size in KB
processing_time_ms: number, // Processing time
include_timestamps: boolean, // Whether timestamps were included
output_format: string, // Output format used
model: string, // Whisper model used
method: string, // 'whisper' or 'mock'
whisper_path: string // Path to Whisper executable
}
}Checks if Whisper is available in the system.
Returns:
{
available: boolean, // Whether Whisper is available
path: string, // Path to Whisper executable
error: string // Error message if not available
}const AudioTranscriber = require('audio-transcriber');
const transcriber = new AudioTranscriber();
transcriber.transcribe('meeting-recording.m4a')
.then(result => {
console.log('Transcript:', result.text);
})
.catch(error => {
console.error('Error:', error.message);
});const AudioTranscriber = require('audio-transcriber');
const transcriber = new AudioTranscriber();
transcriber.transcribe('podcast.mp3', { includeTimestamps: true })
.then(result => {
console.log('Transcript with timestamps:', result.text);
// Output: [00:00:00,000] Hello, this is the beginning...
});const AudioTranscriber = require('audio-transcriber');
const transcriber = new AudioTranscriber({
whisperModel: 'large',
whisperPath: '/Users/me/venvs/whisper/bin/whisper',
tmpDir: '/tmp/audio-transcription',
cleanupTempFiles: false
});
transcriber.transcribe('lecture.wav', {
includeTimestamps: true,
outputFormat: 'srt'
});Hello, this is a transcription of the audio file.
It contains the spoken words without any timestamps.
[00:00:00,000] Hello, this is a transcription of the audio file.
[00:00:03,500] It contains the spoken words with timestamps.
[00:00:07,200] Each segment shows when it was spoken.
1
00:00:00,000 --> 00:00:03,500
Hello, this is a transcription of the audio file.
2
00:00:03,500 --> 00:00:07,200
It contains the spoken words with timestamps.
The library handles various error scenarios gracefully:
- Whisper not available: Falls back to mock transcription
- Invalid audio file: Throws descriptive error
- FFmpeg conversion failure: Attempts alternative approaches
- File system errors: Provides helpful error messages
-
"FFmpeg not found"
- Install FFmpeg:
brew install ffmpeg(macOS) orsudo apt install ffmpeg(Ubuntu)
- Install FFmpeg:
-
"Whisper not available"
- Install Whisper:
pip install openai-whisper - Or specify custom path:
--whisper-path /path/to/whisper
- Install Whisper:
-
"Permission denied"
- Check file permissions
- Ensure write access to output directory
-
"Empty transcript"
- Audio file might be too quiet or corrupted
- Try a different Whisper model
- Check audio file format compatibility
Enable verbose logging by setting the environment variable:
DEBUG=audio-transcriber transcribe audio.mp3- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit your changes:
git commit -am 'Add feature' - Push to the branch:
git push origin feature-name - Submit a pull request
MIT License - see LICENSE file for details.
- OpenAI for the Whisper model
- FFmpeg for audio processing capabilities
- The open-source community for inspiration and tools