Make your website voice-enabled, just like manifest.json makes it installable
The Voice Manifest (voice-manifest.json) enables voice capabilities for websites in the same way that the Web App Manifest (manifest.json) enables Progressive Web App features.
<link rel="voice-manifest" href="/voice-manifest.json" />It's a declarative specificationβyou tell voice clients what your site can do, not how to configure voice providers.
{
"name": "My Website"
}That's literally it! Just one field makes your site discoverable as voice-enabled.
{
"name": "Pasta Paradise",
"display": {
"call_to_action": "Ask about our menu or make a reservation",
"suggested_prompts": [
"What pasta dishes do you have?",
"Make a reservation for Friday at 7 PM"
]
},
"system_prompt": "You are a helpful restaurant assistant.",
"functions": [
{
"name": "make_reservation",
"description": "Create a dining reservation",
"parameters": {
"type": "object",
"properties": {
"date": { "type": "string", "format": "date" },
"time": { "type": "string", "format": "time" },
"party_size": { "type": "integer" },
"name": { "type": "string" },
"phone": { "type": "string" }
},
"required": ["date", "time", "party_size", "name", "phone"]
}
}
]
}That's it! Any compatible voice client can now interact with your site.
Voice Manifest is like manifest.jsonβit declares your site's capabilities, not how to implement them.
- β NOT a configuration file for your voice pipeline
- β A public declaration of what your site can do with voice
- β Voice clients provide fallback providers if you don't specify any
| Level | Features |
|---|---|
| Basic | Name + display hints |
| + Functions | Define voice actions |
| + System Prompt | Customize behavior |
| + MCP | Connect backend services |
| + Agent Config | Specify preferred providers (optional) |
No providers? Voice clients use their own (browser extensions, OS features)
Specific voice agent? All-in-one solution (Retell, Vapi, etc.)
Composite? Mix and match STT/LLM/TTS providers
Control how voice UI appears:
{
"display": {
"icon": "/icons/voice-icon.png",
"background_color": "#8B0000",
"theme_color": "#8B0000",
"activation_phrase": "Talk to Pasta Paradise",
"call_to_action": "Ask about our menu",
"suggested_prompts": ["What's on the menu?", "Make a reservation"]
}
}Uses OpenAI's function calling standard:
{
"functions": [
{
"name": "search_products",
"description": "Search for products",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string" },
"max_price": { "type": "number" }
},
"required": ["query"]
}
}
]
}Define assistant behavior:
{
"system_prompt": "You are a helpful shopping assistant. Be enthusiastic but never pushy."
}Or reference external file:
{
"system_prompt": {
"$ref": "./prompts/system-prompt.txt"
}
}Connect to backend services using the MCP standard:
{
"mcp": {
"servers": {
"myserver": {
"url": "https://api.mywebsite.com/mcp",
"headers": {
"Authorization": "Bearer ${API_KEY}"
}
}
}
}
}Voice clients connect to these URLs to discover tools dynamically via the MCP protocol.
Specify preferred voice providers:
{
"agent": {
"provider": {
"name": "retell",
"endpoint": "https://api.retellai.com/v1",
"agent_id": "agent_abc123"
}
}
}Or composite:
{
"agent": {
"provider": {
"stt": { "name": "deepgram" },
"llm": { "name": "openai", "model": "gpt-4" },
"tts": { "name": "elevenlabs" }
}
}
}See the /examples directory for complete implementations:
- voice-manifest-minimal.json - Just the basics
- voice-manifest-with-mcp.json - With MCP backend
- voice-manifest-with-voice-agent.json - With managed voice agent
- voice-manifest-with-composite-agents.json - With STT/LLM/TTS components
- User visits your website
- Voice client discovers
<link rel="voice-manifest"> - Voice client reads manifest
- Voice client shows activation UI with your branding
- User activates voice
- Voice client uses your functions, prompts, and providers (or its own fallbacks)
- Actions executed via your functions/MCP servers
"Book a table for four tomorrow at 7 PM"
"Show me wireless headphones under $100"
"Book a window seat on the morning flight"
"Schedule a checkup for next Tuesday"
"Transfer $50 to my savings account"
- Explainer - Comprehensive guide
- Schema - JSON Schema for validation
- Quick Reference - Developer reference
- Getting Started - Step-by-step guide
- Architecture - Design decisions
{
"name": "Your Site",
"system_prompt": "You are a helpful assistant.",
"functions": [...]
}<link rel="voice-manifest" href="/voice-manifest.json" />Set up endpoints or MCP servers to handle function calls.
Use voice clients that support Voice Manifest.
| manifest.json | voice-manifest.json |
|---|---|
| Makes site installable (PWA) | Makes site voice-enabled |
<link rel="manifest"> |
<link rel="voice-manifest"> |
| Declares PWA capabilities | Declares voice capabilities |
| Icons, colors, display mode | Prompts, functions, providers |
Voice clients can provide fallbacks when:
- β No providers specified β Client uses its own
- β Some providers specified β Client uses what you want, fills gaps
- β Voice agent specified β All-in-one solution
- β Composite specified β Individual components
This makes the manifest flexible and accessibleβsites can work without requiring specific voice providers.
Early proposal stage (October 2025)
We're seeking feedback from:
- Voice platform providers (Retell, Vapi, ElevenLabs, etc.)
- Browser vendors (Chrome, Safari, Firefox)
- Web developers
- Standards organizations (W3C)
We welcome feedback and contributions!
Areas needing input:
- Real-world implementation experiences
- Browser integration approaches
- Security and privacy considerations
- Multi-modal experiences (voice + visual)
- Standards body feedback
How to contribute:
- Open issues for bugs or suggestions
- Submit PRs with improvements
- Share implementation examples
- Provide feedback on the specification
- Initial specification and schema
- Example implementations
- Reference voice client implementation
- Browser extension prototype
- Developer tooling (validators, generators)
- Standards body submission
This work is licensed under a Creative Commons Attribution-NonCommercial 2.0 license.
Making the web voice-first, one manifest at a time ποΈ