Skip to content

lukeocodes/voice-manifest

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Voice Manifest

Make your website voice-enabled, just like manifest.json makes it installable

Version License: CC BY-NC 2.0

Overview

The Voice Manifest (voice-manifest.json) enables voice capabilities for websites in the same way that the Web App Manifest (manifest.json) enables Progressive Web App features.

<link rel="voice-manifest" href="/voice-manifest.json" />

It's a declarative specificationβ€”you tell voice clients what your site can do, not how to configure voice providers.

Quick Start

Absolute Minimum

{
  "name": "My Website"
}

That's literally it! Just one field makes your site discoverable as voice-enabled.

Minimal Functional Example

{
  "name": "Pasta Paradise",
  "display": {
    "call_to_action": "Ask about our menu or make a reservation",
    "suggested_prompts": [
      "What pasta dishes do you have?",
      "Make a reservation for Friday at 7 PM"
    ]
  },
  "system_prompt": "You are a helpful restaurant assistant.",
  "functions": [
    {
      "name": "make_reservation",
      "description": "Create a dining reservation",
      "parameters": {
        "type": "object",
        "properties": {
          "date": { "type": "string", "format": "date" },
          "time": { "type": "string", "format": "time" },
          "party_size": { "type": "integer" },
          "name": { "type": "string" },
          "phone": { "type": "string" }
        },
        "required": ["date", "time", "party_size", "name", "phone"]
      }
    }
  ]
}

That's it! Any compatible voice client can now interact with your site.

Key Concepts

1. Declaration, Not Configuration

Voice Manifest is like manifest.jsonβ€”it declares your site's capabilities, not how to implement them.

  • ❌ NOT a configuration file for your voice pipeline
  • βœ… A public declaration of what your site can do with voice
  • βœ… Voice clients provide fallback providers if you don't specify any

2. Progressive Enhancement

Level Features
Basic Name + display hints
+ Functions Define voice actions
+ System Prompt Customize behavior
+ MCP Connect backend services
+ Agent Config Specify preferred providers (optional)

3. Provider Flexibility

No providers? Voice clients use their own (browser extensions, OS features)

Specific voice agent? All-in-one solution (Retell, Vapi, etc.)

Composite? Mix and match STT/LLM/TTS providers

Core Features

Display Configuration

Control how voice UI appears:

{
  "display": {
    "icon": "/icons/voice-icon.png",
    "background_color": "#8B0000",
    "theme_color": "#8B0000",
    "activation_phrase": "Talk to Pasta Paradise",
    "call_to_action": "Ask about our menu",
    "suggested_prompts": ["What's on the menu?", "Make a reservation"]
  }
}

Function Calling

Uses OpenAI's function calling standard:

{
  "functions": [
    {
      "name": "search_products",
      "description": "Search for products",
      "parameters": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "max_price": { "type": "number" }
        },
        "required": ["query"]
      }
    }
  ]
}

System Prompt

Define assistant behavior:

{
  "system_prompt": "You are a helpful shopping assistant. Be enthusiastic but never pushy."
}

Or reference external file:

{
  "system_prompt": {
    "$ref": "./prompts/system-prompt.txt"
  }
}

MCP Integration (Optional)

Connect to backend services using the MCP standard:

{
  "mcp": {
    "servers": {
      "myserver": {
        "url": "https://api.mywebsite.com/mcp",
        "headers": {
          "Authorization": "Bearer ${API_KEY}"
        }
      }
    }
  }
}

Voice clients connect to these URLs to discover tools dynamically via the MCP protocol.

Agent Configuration (Optional)

Specify preferred voice providers:

{
  "agent": {
    "provider": {
      "name": "retell",
      "endpoint": "https://api.retellai.com/v1",
      "agent_id": "agent_abc123"
    }
  }
}

Or composite:

{
  "agent": {
    "provider": {
      "stt": { "name": "deepgram" },
      "llm": { "name": "openai", "model": "gpt-4" },
      "tts": { "name": "elevenlabs" }
    }
  }
}

Examples

See the /examples directory for complete implementations:

How It Works

  1. User visits your website
  2. Voice client discovers <link rel="voice-manifest">
  3. Voice client reads manifest
  4. Voice client shows activation UI with your branding
  5. User activates voice
  6. Voice client uses your functions, prompts, and providers (or its own fallbacks)
  7. Actions executed via your functions/MCP servers

Use Cases

🍝 Restaurants

"Book a table for four tomorrow at 7 PM"

πŸ›’ E-Commerce

"Show me wireless headphones under $100"

✈️ Travel

"Book a window seat on the morning flight"

πŸ₯ Healthcare

"Schedule a checkup for next Tuesday"

🏦 Banking

"Transfer $50 to my savings account"

Documentation

Implementation

1. Create Manifest

{
  "name": "Your Site",
  "system_prompt": "You are a helpful assistant.",
  "functions": [...]
}

2. Link from HTML

<link rel="voice-manifest" href="/voice-manifest.json" />

3. Implement Handlers

Set up endpoints or MCP servers to handle function calls.

4. Test

Use voice clients that support Voice Manifest.

Comparison to manifest.json

manifest.json voice-manifest.json
Makes site installable (PWA) Makes site voice-enabled
<link rel="manifest"> <link rel="voice-manifest">
Declares PWA capabilities Declares voice capabilities
Icons, colors, display mode Prompts, functions, providers

Provider Fallback Strategy

Voice clients can provide fallbacks when:

  • βœ… No providers specified β†’ Client uses its own
  • βœ… Some providers specified β†’ Client uses what you want, fills gaps
  • βœ… Voice agent specified β†’ All-in-one solution
  • βœ… Composite specified β†’ Individual components

This makes the manifest flexible and accessibleβ€”sites can work without requiring specific voice providers.

Status

Early proposal stage (October 2025)

We're seeking feedback from:

  • Voice platform providers (Retell, Vapi, ElevenLabs, etc.)
  • Browser vendors (Chrome, Safari, Firefox)
  • Web developers
  • Standards organizations (W3C)

Contributing

We welcome feedback and contributions!

Areas needing input:

  • Real-world implementation experiences
  • Browser integration approaches
  • Security and privacy considerations
  • Multi-modal experiences (voice + visual)
  • Standards body feedback

How to contribute:

  • Open issues for bugs or suggestions
  • Submit PRs with improvements
  • Share implementation examples
  • Provide feedback on the specification

Roadmap

  • Initial specification and schema
  • Example implementations
  • Reference voice client implementation
  • Browser extension prototype
  • Developer tooling (validators, generators)
  • Standards body submission

License

This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 license.


Making the web voice-first, one manifest at a time πŸŽ™οΈ