Skip to content

Yikic/InterviewPilot

Repository files navigation

ChatPilot

QuickStart

A simple, keyboard-driven TUI that helps run a two-party conversation workflow using audio recordings on both sides, ASR to text, and an LLM to generate assistant replies.

  • Start input recording (microphone)
  • Stop and transcribe (role: user)
  • Start output recording (the other side)
  • Stop and transcribe (role: friend)
  • Send conversation to an LLM to generate assistant response
  • Loop

Features

  • Clear module architecture (audio, ASR, LLM, conversation, app)
  • Pluggable ASR/LLM providers (Gemini/OpenAI for LLM; OpenAI for ASR; room for local ASR)
  • Simple conversation persistence to JSON
  • Curses-based minimal TUI and keyboard controls

Requirements

  • Python 3.10+
  • Packages
    • openai SDK used for both OpenAI and Gemini (via OpenAI-compatible base_url). No extra SDK required.
    • sounddevice, soundfile (for audio I/O)
    • pyyaml
    • numpy

Installation

  1. Create a virtual environment and install dependencies.
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install pyyaml openai sounddevice soundfile numpy
  1. Set your API keys.
# For Gemini (LLM & ASR) via OpenAI-compatible endpoint
export GOOGLE_API_KEY=your-gemini-key
# Optional: override base url (defaults to google v1beta openai bridge)
# export GEMINI_OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"

# If switching ASR/LLM provider to OpenAI
# export OPENAI_API_KEY=sk-...
  1. Optional: generate a default config file chatpilot.yaml at project root.
# chatpilot.yaml
audio:
  sample_rate: 16000
  channels: 1
  dtype: float32
  input_device: null
  output_device: null
asr:
  provider: openai
  model: gpt-4o-mini-transcribe
  language: null
llm:
  provider: openai
  model: gpt-4o-mini
  temperature: 0.3
  system_prompt: "You are a helpful friend in a two-party conversation. Respond concisely."
history_path: ./conversation_history.json

Usage

Run the app:

python -m chatpilot.app

Keyboard shortcuts inside the TUI:

  • s: start input recording (microphone)
  • e: end input recording, transcribe as user, auto-start output recording
  • o: start output recording (manual)
  • p: end output recording, transcribe as friend, call LLM, show status
  • q: quit

Conversation is saved to conversation_history.json by default.

Architecture

  • chatpilot/config.py: Load YAML config into dataclasses
  • chatpilot/audio.py: Recorder for WAV capture (start/stop)
  • chatpilot/asr/*: ASR interfaces and OpenAI implementation
  • chatpilot/llm/*: LLM interfaces and Gemini/OpenAI implementations
  • chatpilot/conversation.py: Message storage and OpenAI payload conversion
  • chatpilot/app.py: Curses TUI orchestrating the full loop

Notes

  • Recording relies on your system audio devices. Use arecord -l (Linux) to verify devices. You can list devices programmatically via Recorder.list_input_devices().
  • If you need a local/offline ASR, you can add a new class implementing ASRBase and configure asr.provider accordingly. Same for LLM.
  • The TUI shows high-level status; for detailed logs, you can add print/logging statements as needed.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages