Audio Transcriber

A standalone audio transcription library using OpenAI's Whisper model. This library provides both a Node.js API and a command-line interface for transcribing audio files.

📸 Screenshots

Desktop Application Interface

Main interface with drag-and-drop file selection

Real-time transcription progress with status updates

Results display with save and copy options

Transcription settings and configuration panel

Whisper model selection with size and accuracy details

Features

🎵 Multiple Audio Formats: Supports WebM, WAV, MP3, M4A, MP4, OGG, FLAC, AAC
⏱️ Optional Timestamps: Include timestamps in your transcripts
🎯 Multiple Whisper Models: Choose from tiny, base, small, medium, large
- Tiny: Fastest, least accurate (39M parameters)
- Base: Fast, good accuracy (74M parameters)
- Small: Balanced speed/accuracy (244M parameters)
- Medium: Recommended (769M parameters)
- Large: Slowest, most accurate (1550M parameters)
🔧 Flexible Output Formats: TXT, SRT, JSON
🛡️ Error Handling: Graceful fallback to mock mode if Whisper isn't available
🧹 Auto Cleanup: Automatically cleans up temporary files
📱 Cross-Platform: Works on macOS, Linux, and Windows

Installation

Prerequisites

FFmpeg: Required for audio conversion

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows
# Download from https://ffmpeg.org/download.html

Whisper (Optional): For actual transcription

# Install Whisper
pip install openai-whisper

# Or use a virtual environment
python -m venv whisper-env
source whisper-env/bin/activate  # On Windows: whisper-env\Scripts\activate
pip install openai-whisper

Install the Library

# Clone the repository
git clone https://github.com/yourusername/audio-transcriber.git
cd audio-transcriber

# Install dependencies
npm install

# Make CLI globally available (optional)
npm link

Usage

Command Line Interface

Basic Usage

# Transcribe an audio file
transcribe audio.mp3

# With timestamps
transcribe audio.m4a --timestamps

# Specify output format
transcribe audio.wav --format srt

# Use a specific Whisper model
transcribe audio.webm --model large

Advanced Options

# Full command with all options
transcribe "My Audio File.m4a" \
  --timestamps \
  --model medium \
  --format txt \
  --output transcript.txt \
  --whisper-path /path/to/whisper

Command Line Options

Option	Short	Description
`--timestamps`	`-t`	Include timestamps in the transcript
`--format <format>`	`-f`	Output format: txt, srt, json (default: txt)
`--model <model>`	`-m`	Whisper model: tiny, base, small, medium, large (default: medium)
`--whisper-path <path>`	`-w`	Custom path to Whisper executable
`--output <file>`	`-o`	Output file path (default: auto-generated)
`--help`	`-h`	Show help message

Node.js API

Basic Usage

const AudioTranscriber = require('audio-transcriber');

async function transcribeAudio() {
  // Initialize transcriber
  const transcriber = new AudioTranscriber({
    whisperModel: 'medium',        // Optional: 'tiny', 'base', 'small', 'medium', 'large'
    whisperPath: '/path/to/whisper', // Optional: custom Whisper path
    tmpDir: './tmp',               // Optional: temporary directory
    cleanupTempFiles: true         // Optional: auto-cleanup (default: true)
  });

  try {
    // Transcribe with timestamps
    const result = await transcriber.transcribe('audio.mp3', {
      includeTimestamps: true,
      outputFormat: 'txt'
    });

    console.log('Transcript:', result.text);
    console.log('File info:', result.file_info);
    
  } catch (error) {
    console.error('Transcription failed:', error.message);
  }
}

transcribeAudio();

API Reference

`new AudioTranscriber(options)`

Creates a new AudioTranscriber instance.

Options:

whisperModel (string): Whisper model size ('tiny', 'base', 'small', 'medium', 'large')
whisperPath (string): Custom path to Whisper executable
tmpDir (string): Directory for temporary files
cleanupTempFiles (boolean): Whether to auto-cleanup temp files

`transcriber.transcribe(inputFilePath, options)`

Transcribes an audio file.

Parameters:

inputFilePath (string): Path to the audio file
options (object):
- includeTimestamps (boolean): Include timestamps in output
- outputFormat (string): Output format ('txt', 'srt', 'json')

Returns:

{
  text: string,           // The transcribed text
  file_info: {
    filename: string,     // Original filename
    size: number,         // File size in bytes
    content_type: string, // MIME type
    size_kb: string,      // File size in KB
    processing_time_ms: number, // Processing time
    include_timestamps: boolean, // Whether timestamps were included
    output_format: string,      // Output format used
    model: string,        // Whisper model used
    method: string,       // 'whisper' or 'mock'
    whisper_path: string  // Path to Whisper executable
  }
}

`transcriber.checkWhisperAvailability()`

Checks if Whisper is available in the system.

Returns:

{
  available: boolean,     // Whether Whisper is available
  path: string,          // Path to Whisper executable
  error: string          // Error message if not available
}

Examples

Example 1: Basic Transcription

const AudioTranscriber = require('audio-transcriber');

const transcriber = new AudioTranscriber();

transcriber.transcribe('meeting-recording.m4a')
  .then(result => {
    console.log('Transcript:', result.text);
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

Example 2: With Timestamps

const AudioTranscriber = require('audio-transcriber');

const transcriber = new AudioTranscriber();

transcriber.transcribe('podcast.mp3', { includeTimestamps: true })
  .then(result => {
    console.log('Transcript with timestamps:', result.text);
    // Output: [00:00:00,000] Hello, this is the beginning...
  });

Example 3: Custom Configuration

const AudioTranscriber = require('audio-transcriber');

const transcriber = new AudioTranscriber({
  whisperModel: 'large',
  whisperPath: '/Users/me/venvs/whisper/bin/whisper',
  tmpDir: '/tmp/audio-transcription',
  cleanupTempFiles: false
});

transcriber.transcribe('lecture.wav', {
  includeTimestamps: true,
  outputFormat: 'srt'
});

Output Formats

Plain Text (default)

Hello, this is a transcription of the audio file.
It contains the spoken words without any timestamps.

With Timestamps

[00:00:00,000] Hello, this is a transcription of the audio file.

[00:00:03,500] It contains the spoken words with timestamps.

[00:00:07,200] Each segment shows when it was spoken.

SRT Format

1
00:00:00,000 --> 00:00:03,500
Hello, this is a transcription of the audio file.

2
00:00:03,500 --> 00:00:07,200
It contains the spoken words with timestamps.

Error Handling

The library handles various error scenarios gracefully:

Whisper not available: Falls back to mock transcription
Invalid audio file: Throws descriptive error
FFmpeg conversion failure: Attempts alternative approaches
File system errors: Provides helpful error messages

Troubleshooting

Common Issues

"FFmpeg not found"
- Install FFmpeg: brew install ffmpeg (macOS) or sudo apt install ffmpeg (Ubuntu)
"Whisper not available"
- Install Whisper: pip install openai-whisper
- Or specify custom path: --whisper-path /path/to/whisper
"Permission denied"
- Check file permissions
- Ensure write access to output directory
"Empty transcript"
- Audio file might be too quiet or corrupted
- Try a different Whisper model
- Check audio file format compatibility

Debug Mode

Enable verbose logging by setting the environment variable:

DEBUG=audio-transcriber transcribe audio.mp3

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit your changes: git commit -am 'Add feature'
Push to the branch: git push origin feature-name
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

OpenAI for the Whisper model
FFmpeg for audio processing capabilities
The open-source community for inspiration and tools

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
electron		electron
python-whisper		python-whisper
screenshots		screenshots
scripts		scripts
.gitignore		.gitignore
ELECTRON_README.md		ELECTRON_README.md
QUICK_START.md		QUICK_START.md
README.md		README.md
cli.js		cli.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
test.js		test.js
test_audio.txt		test_audio.txt

LM-Towner/AudioTranscriber

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

📸 Screenshots

Desktop Application Interface

Features

Installation

Prerequisites

Install the Library

Usage

Command Line Interface

Basic Usage

Advanced Options

Command Line Options

Node.js API

Basic Usage

API Reference

new AudioTranscriber(options)

transcriber.transcribe(inputFilePath, options)

transcriber.checkWhisperAvailability()

Examples

Example 1: Basic Transcription

Example 2: With Timestamps

Example 3: Custom Configuration

Output Formats

Plain Text (default)

With Timestamps

SRT Format

Error Handling

Troubleshooting

Common Issues

Debug Mode

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`new AudioTranscriber(options)`

`transcriber.transcribe(inputFilePath, options)`

`transcriber.checkWhisperAvailability()`

Packages