ποΈ EchoScribe: Real-Time, Speaker-Aware Meeting Transcripts β No Cloud Needed, On your Local Device, and FREE! ππ§βπ€βπ§
Meet ποΈ EchoScribe β your free AI-powered meeting companion! π It captures conversations π€, transcribes them instantly in English π, separates each speaker π§βπ€βπ§, and saves your transcripts in TXT or SRT. All 100% free, open-source, and runs right on your own device. π»β¨
By feeding your speaker-labeled meeting transcripts to ChatGPT, you unlock powerful insights and productivity boosts:
- Summarize meetings for everyone β generate clear, concise summaries for laymen or management.
- Automatically track action items β identify tasks discussed and assign responsibilities without manual effort.
- Spot issues and solutions β quickly highlight problems raised and the solutions proposed.
- Get AI-driven guidance β receive actionable answers for unresolved questions or challenges discussed in the meeting.
- π§ Real-time audio capture from your microphone
- π Speech-to-text transcription using faster-whisper
- π§βπ€βπ§ Speaker diarization with SpeechBrain
- πΎ Automatic saving of transcripts in:
- TXT (readable transcripts)
- SRT (subtitle format with timestamps)
- π Session history with downloadable past transcripts
- π Beautiful Gradio Web UI + REST API (via FastAPI)
This program was tested using Python 3.10.16 on an Apple M1 Mac running macOS Ventura 13.7.1.
python3 --version
# Python 3.10.16
pip3 --version
# pip 23.0.1 from /Users/{your_username}/.pyenv/versions/3.10.16/lib/python3.10/site-packages/pip (python 3.10)
uname -a
# Darwin {your_machine_name} 22.6.0 Darwin Kernel Version 22.6.0: Thu Sep 5 20:47:01 PDT 2024; root:xnu-8796.141.3.708.1~1/RELEASE_ARM64_T6000 arm64
sw_vers
# ProductName: macOS
# ProductVersion: 13.7.1
# BuildVersion: 22H221
uname -m
# arm64
sysctl -n machdep.cpu.brand_string
# Apple M1 Max# Clone this repository
git clone https://github.com/tech-magic/echo-scribe.git
cd echo-scribe
# Create your own python virtual environment
python3 -m venv echo-scribe-venv
source echo-scribe-venv/bin/activate
# Install all requirements into the python virtual environment
pip3 install -r requirements.txt
# Run the app from the python virtual environment
python3 app.pyThen open your browser at π http://localhost:7860
The first run (during the initial setup) will take a while because it needs to download the optimized Whisper and SpeechBrain models from their repositories to your computer.
- Start Recording
βΆοΈ - Stop Recording βΉοΈ
- Transcript Panel π β real-time streaming transcript
- Past Sessions π β download TXT / SRT files
-
TimestampFormatter π
Formats timestamps to text. -
FileUtils π
Handles file path operations.
Example: Get relative file paths if they exist in session directories.
-
AudioProducer π€
Captures audio from the microphone.
Example: Records sound and sends it to a queue for processing. -
AudioConsumer π§
Reads audio chunks from the queue.
Example: Processes audio chunks for transcription and speaker detection. -
AudioQueue π
Acts as a buffer between producer and consumer.
Example: Stores audio chunks temporarily for consumption.
-
SpeechToText π
Transcribes audio to text.
Example: Uses FastWhisper to convert speech into readable text. -
SpeakerDiarizer π
Differentiates between speakers in audio.
Example: Assigns speaker IDs and tracks who is speaking when.
-
TranscriptWriter π
Writes transcripts to text and SRT files.
Example: Generates formatted transcript files with timestamps and speaker labels. -
SessionManager ποΈ
Manages session directories and stored files.
Example: Keeps track of multiple recording sessions and file organization.
- TranscriptionPipeline π
Orchestrates the whole transcription workflow.
Example: Coordinates audio capture, processing, transcription, diarization, and writing.
classDiagram
%% =======================
%% Utilities
%% =======================
class TimestampFormatter {
+format(seconds: float, srt: bool) string
}
class FileUtils {
+get_relative_if_exists(session_path: str, filename: str) string
}
%% =======================
%% Audio Components
%% =======================
class AudioProducer {
+queue
+sample_rate
+callback(indata, frames, time, status)
+record_stream()
}
note for AudioProducer "Captures audio from local microphone<br/>(using python sounddevice library)"
class AudioConsumer {
+queue
+sample_rate
+chunk_duration
+overlap_duration
+audio_buffer
+total_audio_time
+get_next_chunk() tuple
}
class AudioQueue {
}
%% =======================
%% AI Components
%% =======================
class SpeechToText {
+model
+transcribe(audio_np: np.ndarray)
}
note for SpeechToText "Transcribes audio to text using fastwhisper<br/>(optimized OpenAI's Whisper model)"
class SpeakerDiarizer {
+spkrec
+speakers
+next_speaker_id
+similarity_threshold
+_get_embedding(chunk: np.ndarray)
+get_label(chunk: np.ndarray, start_time: float) string
}
note for SpeakerDiarizer "Differentiates different speakers<br/>(using speechbrain)"
class TranscriptWriter {
+txt_file
+srt_file
+srt_index
+write(start, end, speaker, text) string
+close()
}
class SessionManager {
+base_dir
+list_sessions() list
}
%% =======================
%% Coordinator
%% =======================
class TranscriptionPipeline {
+producer
+consumer
+stt
+diarizer
+writer
+executor
+stop_event
+transcript_lines
+process_chunk(chunk, chunk_start_time)
+run()
}
%% =======================
%% Relationships
%% =======================
TranscriptionPipeline --> AudioProducer : uses
TranscriptionPipeline --> AudioConsumer : uses
TranscriptionPipeline --> SpeechToText : uses
TranscriptionPipeline --> SpeakerDiarizer : uses
TranscriptionPipeline --> TranscriptWriter : writes
AudioProducer --> AudioQueue : produces_audio_for
AudioConsumer --> AudioQueue : consumes_audio_from
TranscriptWriter --> SessionManager : stores_sessions_in
TranscriptWriter --> TimestampFormatter : uses
SessionManager --> FileUtils : uses
All recordings (during each captured session) are saved under the data/ directory:
data/
βββ 01-09-2025-20-15-45/
β βββ 01-09-2025-20-15-45_transcript.txt
β βββ 01-09-2025-20-15-45_subtitles.srt
βββ 01-09-2025-21-00-12/
β βββ 01-09-2025-21-00-12_transcript.txt
β βββ 01-09-2025-21-00-12_subtitles.srt
EchoScribe also provides REST endpoints via FastAPI:
- Download transcript/subtitles files:
GET /gradio/data/{session_id}/{filename}
Example:
curl http://localhost:7860/gradio/data/01-09-2025-20-15-45/01-09-2025-20-15-45_transcript.srt -o 01-09-2025-20-15-45_transcript.srt- Gradio β Web UI
- FastAPI β REST API
- faster-whisper β ASR engine
- SpeechBrain β Speaker recognition
- PyTorch β Deep learning backend
- scikit-learn β Similarity metrics
- Multi-language transcription π
- Pre-trained speaker labeling (e.g., "Alice", "Bob") π·οΈ
MIT License Β© 2025
Happy Transcribing! π
