🗣️ voice2machine

voice dictation for any text field in your OS

what is this

A tool that converts your voice to text using your local GPU.

The premise is simple: speaking is faster than typing. This project allows you to dictate in any application without depending on cloud services.

philosophy

local-first: your audio never leaves your machine
modular: started as a script, now it's an app with separated responsibilities
gpu-powered: transcription speed using WHISPER locally

how it works

The system runs as a Background Daemon that exposes a FastAPI REST API on localhost:8765.

component	role
`daemon`	Handles audio recording, Whisper transcription, and LLM processing via REST endpoints.
`shortcuts`	Global keyboard shortcuts that send HTTP requests to the daemon.

documentation

All technical info is in /docs (consolidated in Spanish):

visual flows

voice → text

flowchart LR
A[🎤 record] --> B{whisper}
B --> C[📋 clipboard]

text → improved text

flowchart LR
A[📋 copy] --> B{LLM}
B --> C[📋 replace]

if you don't see the diagrams, you need a mermaid extension

license

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.cache/plugin		.cache/plugin
.github		.github
.vscode		.vscode
apps/daemon/backend		apps/daemon/backend
docs		docs
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
voice2machine.code-workspace		voice2machine.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ voice2machine

what is this

philosophy

how it works

documentation

visual flows

voice → text

text → improved text

license

About

Uh oh!

Contributors 3

Uh oh!

Languages

License

zarvent/v2m_lab

Folders and files

Latest commit

History

Repository files navigation

🗣️ voice2machine

what is this

philosophy

how it works

documentation

visual flows

voice → text

text → improved text

license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 3

Uh oh!

Languages