Skip to content

zarvent/v2m_lab

Repository files navigation

🗣️ voice2machine

voice dictation for any text field in your OS


what is this

A tool that converts your voice to text using your local GPU.

The premise is simple: speaking is faster than typing. This project allows you to dictate in any application without depending on cloud services.


philosophy

  • local-first: your audio never leaves your machine
  • modular: started as a script, now it's an app with separated responsibilities
  • gpu-powered: transcription speed using WHISPER locally

how it works

The system runs as a Background Daemon that exposes a FastAPI REST API on localhost:8765.

component role
daemon Handles audio recording, Whisper transcription, and LLM processing via REST endpoints.
shortcuts Global keyboard shortcuts that send HTTP requests to the daemon.

documentation

All technical info is in /docs (consolidated in Spanish):


visual flows

voice → text

flowchart LR
A[🎤 record] --> B{whisper}
B --> C[📋 clipboard]
Loading

text → improved text

flowchart LR
A[📋 copy] --> B{LLM}
B --> C[📋 replace]
Loading

if you don't see the diagrams, you need a mermaid extension


license

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for more details.

About

voice2machine lab

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •