A simple, keyboard-driven TUI that helps run a two-party conversation workflow using audio recordings on both sides, ASR to text, and an LLM to generate assistant replies.
- Start input recording (microphone)
- Stop and transcribe (role:
user) - Start output recording (the other side)
- Stop and transcribe (role:
friend) - Send conversation to an LLM to generate
assistantresponse - Loop
- Clear module architecture (audio, ASR, LLM, conversation, app)
- Pluggable ASR/LLM providers (Gemini/OpenAI for LLM; OpenAI for ASR; room for local ASR)
- Simple conversation persistence to JSON
- Curses-based minimal TUI and keyboard controls
- Python 3.10+
- Packages
- openai SDK used for both OpenAI and Gemini (via OpenAI-compatible base_url). No extra SDK required.
- sounddevice, soundfile (for audio I/O)
- pyyaml
- numpy
- Create a virtual environment and install dependencies.
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install pyyaml openai sounddevice soundfile numpy- Set your API keys.
# For Gemini (LLM & ASR) via OpenAI-compatible endpoint
export GOOGLE_API_KEY=your-gemini-key
# Optional: override base url (defaults to google v1beta openai bridge)
# export GEMINI_OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
# If switching ASR/LLM provider to OpenAI
# export OPENAI_API_KEY=sk-...- Optional: generate a default config file
chatpilot.yamlat project root.
# chatpilot.yaml
audio:
sample_rate: 16000
channels: 1
dtype: float32
input_device: null
output_device: null
asr:
provider: openai
model: gpt-4o-mini-transcribe
language: null
llm:
provider: openai
model: gpt-4o-mini
temperature: 0.3
system_prompt: "You are a helpful friend in a two-party conversation. Respond concisely."
history_path: ./conversation_history.jsonRun the app:
python -m chatpilot.appKeyboard shortcuts inside the TUI:
- s: start input recording (microphone)
- e: end input recording, transcribe as
user, auto-start output recording - o: start output recording (manual)
- p: end output recording, transcribe as
friend, call LLM, show status - q: quit
Conversation is saved to conversation_history.json by default.
chatpilot/config.py: Load YAML config into dataclasseschatpilot/audio.py: Recorder for WAV capture (start/stop)chatpilot/asr/*: ASR interfaces and OpenAI implementationchatpilot/llm/*: LLM interfaces and Gemini/OpenAI implementationschatpilot/conversation.py: Message storage and OpenAI payload conversionchatpilot/app.py: Curses TUI orchestrating the full loop
- Recording relies on your system audio devices. Use
arecord -l(Linux) to verify devices. You can list devices programmatically viaRecorder.list_input_devices(). - If you need a local/offline ASR, you can add a new class implementing
ASRBaseand configureasr.provideraccordingly. Same for LLM. - The TUI shows high-level status; for detailed logs, you can add print/logging statements as needed.
MIT