Skip to content

Your free AI-powered meeting companion! πŸš€ It captures conversations 🎀, transcribes them instantly in English πŸ“œ, separates each speaker πŸ§‘β€πŸ€β€πŸ§‘, and saves your transcripts in TXT or SRT.

License

Notifications You must be signed in to change notification settings

tech-magic/echo-scribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ EchoScribe: Real-Time, Speaker-Aware Meeting Transcripts β€” No Cloud Needed, On your Local Device, and FREE! πŸ•’πŸ§‘β€πŸ€β€πŸ§‘

Meet πŸŽ™οΈ EchoScribe β€” your free AI-powered meeting companion! πŸš€ It captures conversations 🎀, transcribes them instantly in English πŸ“œ, separates each speaker πŸ§‘β€πŸ€β€πŸ§‘, and saves your transcripts in TXT or SRT. All 100% free, open-source, and runs right on your own device. πŸ’»βœ¨

By feeding your speaker-labeled meeting transcripts to ChatGPT, you unlock powerful insights and productivity boosts:

  • Summarize meetings for everyone β€” generate clear, concise summaries for laymen or management.
  • Automatically track action items β€” identify tasks discussed and assign responsibilities without manual effort.
  • Spot issues and solutions β€” quickly highlight problems raised and the solutions proposed.
  • Get AI-driven guidance β€” receive actionable answers for unresolved questions or challenges discussed in the meeting.

πŸš€ Features

  • 🎧 Real-time audio capture from your microphone
  • πŸ“ Speech-to-text transcription using faster-whisper
  • πŸ§‘β€πŸ€β€πŸ§‘ Speaker diarization with SpeechBrain
  • πŸ’Ύ Automatic saving of transcripts in:
    • TXT (readable transcripts)
    • SRT (subtitle format with timestamps)
  • πŸ“‚ Session history with downloadable past transcripts
  • 🌐 Beautiful Gradio Web UI + REST API (via FastAPI)

πŸ“Έ Demo

Demo


πŸ“¦ Installation Guide

πŸ–₯️ Test Environment

This program was tested using Python 3.10.16 on an Apple M1 Mac running macOS Ventura 13.7.1.

python3 --version
# Python 3.10.16

pip3 --version
# pip 23.0.1 from /Users/{your_username}/.pyenv/versions/3.10.16/lib/python3.10/site-packages/pip (python 3.10)

uname -a
# Darwin {your_machine_name} 22.6.0 Darwin Kernel Version 22.6.0: Thu Sep  5 20:47:01 PDT 2024; root:xnu-8796.141.3.708.1~1/RELEASE_ARM64_T6000 arm64

sw_vers
# ProductName:            macOS
# ProductVersion:         13.7.1
# BuildVersion:           22H221

uname -m
# arm64

sysctl -n machdep.cpu.brand_string
# Apple M1 Max

βš™οΈ Installation Steps

# Clone this repository
git clone https://github.com/tech-magic/echo-scribe.git
cd echo-scribe

# Create your own python virtual environment
python3 -m venv echo-scribe-venv
source echo-scribe-venv/bin/activate

# Install all requirements into the python virtual environment
pip3 install -r requirements.txt

# Run the app from the python virtual environment
python3 app.py

Then open your browser at πŸ‘‰ http://localhost:7860

The first run (during the initial setup) will take a while because it needs to download the optimized Whisper and SpeechBrain models from their repositories to your computer.


πŸ’» Web UI Preview

  • Start Recording ▢️
  • Stop Recording ⏹️
  • Transcript Panel πŸ“œ – real-time streaming transcript
  • Past Sessions πŸ“‚ – download TXT / SRT files

🏷️ Application Design and Overview

Utilities

  • TimestampFormatter πŸ•’
    Formats timestamps to text.

  • FileUtils πŸ“
    Handles file path operations.
    Example: Get relative file paths if they exist in session directories.

Audio Components

  • AudioProducer 🎀
    Captures audio from the microphone.
    Example: Records sound and sends it to a queue for processing.

  • AudioConsumer 🎧
    Reads audio chunks from the queue.
    Example: Processes audio chunks for transcription and speaker detection.

  • AudioQueue πŸ”„
    Acts as a buffer between producer and consumer.
    Example: Stores audio chunks temporarily for consumption.

AI Components

  • SpeechToText πŸ“
    Transcribes audio to text.
    Example: Uses FastWhisper to convert speech into readable text.

  • SpeakerDiarizer 🎭
    Differentiates between speakers in audio.
    Example: Assigns speaker IDs and tracks who is speaking when.

Transcription Components

  • TranscriptWriter πŸ“œ
    Writes transcripts to text and SRT files.
    Example: Generates formatted transcript files with timestamps and speaker labels.

  • SessionManager πŸ—‚οΈ
    Manages session directories and stored files.
    Example: Keeps track of multiple recording sessions and file organization.

Coordinator

  • TranscriptionPipeline πŸš€
    Orchestrates the whole transcription workflow.
    Example: Coordinates audio capture, processing, transcription, diarization, and writing.

Class Diagram

classDiagram
    %% =======================
    %% Utilities
    %% =======================
    class TimestampFormatter {
        +format(seconds: float, srt: bool) string
    }

    class FileUtils {
        +get_relative_if_exists(session_path: str, filename: str) string
    }

    %% =======================
    %% Audio Components
    %% =======================
    class AudioProducer {
        +queue
        +sample_rate
        +callback(indata, frames, time, status)
        +record_stream()
    }

    note for AudioProducer "Captures audio from local microphone<br/>(using python sounddevice library)"

    class AudioConsumer {
        +queue
        +sample_rate
        +chunk_duration
        +overlap_duration
        +audio_buffer
        +total_audio_time
        +get_next_chunk() tuple
    }

    class AudioQueue {
    }

    %% =======================
    %% AI Components
    %% =======================
    class SpeechToText {
        +model
        +transcribe(audio_np: np.ndarray)
    }

    note for SpeechToText "Transcribes audio to text using fastwhisper<br/>(optimized OpenAI's Whisper model)"

    class SpeakerDiarizer {
        +spkrec
        +speakers
        +next_speaker_id
        +similarity_threshold
        +_get_embedding(chunk: np.ndarray)
        +get_label(chunk: np.ndarray, start_time: float) string
    }

    note for SpeakerDiarizer "Differentiates different speakers<br/>(using speechbrain)"

    class TranscriptWriter {
        +txt_file
        +srt_file
        +srt_index
        +write(start, end, speaker, text) string
        +close()
    }

    class SessionManager {
        +base_dir
        +list_sessions() list
    }

    %% =======================
    %% Coordinator
    %% =======================
    class TranscriptionPipeline {
        +producer
        +consumer
        +stt
        +diarizer
        +writer
        +executor
        +stop_event
        +transcript_lines
        +process_chunk(chunk, chunk_start_time)
        +run()
    }

    %% =======================
    %% Relationships
    %% =======================
    TranscriptionPipeline --> AudioProducer : uses
    TranscriptionPipeline --> AudioConsumer : uses
    TranscriptionPipeline --> SpeechToText : uses
    TranscriptionPipeline --> SpeakerDiarizer : uses
    TranscriptionPipeline --> TranscriptWriter : writes

    AudioProducer --> AudioQueue : produces_audio_for
    AudioConsumer --> AudioQueue : consumes_audio_from
    TranscriptWriter --> SessionManager : stores_sessions_in
    TranscriptWriter --> TimestampFormatter : uses
    SessionManager --> FileUtils : uses
Loading

πŸ“‚ Session Management

All recordings (during each captured session) are saved under the data/ directory:

data/
 β”œβ”€β”€ 01-09-2025-20-15-45/
 β”‚   β”œβ”€β”€ 01-09-2025-20-15-45_transcript.txt
 β”‚   └── 01-09-2025-20-15-45_subtitles.srt
 β”œβ”€β”€ 01-09-2025-21-00-12/
 β”‚   β”œβ”€β”€ 01-09-2025-21-00-12_transcript.txt
 β”‚   └── 01-09-2025-21-00-12_subtitles.srt

πŸ”Œ API Endpoints

EchoScribe also provides REST endpoints via FastAPI:

  • Download transcript/subtitles files:
    GET /gradio/data/{session_id}/{filename}
    

Example:

curl http://localhost:7860/gradio/data/01-09-2025-20-15-45/01-09-2025-20-15-45_transcript.srt -o 01-09-2025-20-15-45_transcript.srt

βš™οΈ Tech Stack


✨ Further Improvements

  • Multi-language transcription 🌍
  • Pre-trained speaker labeling (e.g., "Alice", "Bob") 🏷️

πŸ“œ License

MIT License Β© 2025


Happy Transcribing! πŸš€

About

Your free AI-powered meeting companion! πŸš€ It captures conversations 🎀, transcribes them instantly in English πŸ“œ, separates each speaker πŸ§‘β€πŸ€β€πŸ§‘, and saves your transcripts in TXT or SRT.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages