Build smart AI apps for smart glasses, fast.
GlassKit is an open-source dev suite for building vision-enabled smart glasses apps. It provides SDKs and backends that turn real-time camera and microphone streams into specialized AI responses and actions, tailored to your workflow.
Today: this repository focuses on end-to-end examples you can adapt. Next: reusable SDKs + a production-ready backend are coming up.
| IKEA assembly assistant | Sushi speedrun HUD | Privacy filter |
|---|---|---|
demo.webm |
demo.webm |
demo.mp4 |
|
Code ➡️ ·
Code (+ RF-DETR) ➡️
Real-time, vision-enabled voice assistant for Rokid Glasses. Streams mic + camera over WebRTC to the OpenAI Realtime API, plays back speech, and uses tool calls to guide tasks like IKEA assembly steps. The RF-DETR variant adds object detection and passes annotated frames to OpenAI for better visual understanding. |
Code ➡️
Real-world speedrun HUD for Rokid Glasses. Streams video over WebRTC with a data channel to the backend, which runs a fine-tuned RF-DETR object detector for automatic, hands-free split completion based on a configured route. |
Code ➡️
Real-time privacy filter that sits between the camera and app. Anonymizes faces without consent, detects and remembers verbal consent, and runs locally with recording support. |
Smart glasses apps are hard.
- Generic vision-capable LLMs often fail at real-world task support.
- Each glasses brand has different hardware, form factors, and frameworks.
- Real-time camera + mic streaming is non-trivial to build correctly and ergonomically.
GlassKit is built around:
- Vision model orchestration: choose the right mix of multimodal LLMs and object detectors for the job.
- Visual context management: define what the AI should know and how it is represented.
- Real-time streaming: camera + mic in, responses out, with sane developer ergonomics.
You define your AI with visual/textual context and your business logic. Then your app works like this:
- Camera frames and audio stream from the glasses to the backend via the SDK
- The backend processes inputs using vision models and LLMs with your custom context + logic
- Responses stream back to the glasses and the wearer via the SDK
You handle the app logic. GlassKit handles the glasses-to-AI pipeline.
- Pick an example from
examples/ - Open its README and follow the setup steps
- Run it, then modify for your workflow
GlassKit is early and under active development, but the examples are usable today.
- Current focus: end-to-end templates you can clone and adapt
- Coming next: reusable SDKs + production-ready backends
- Developer experience: demo video recording tooling; observability + debuggability tools
- Platform support today: Rokid Glasses
- Planned support: Meta glasses, Android XR, Mentra, and more
Contributions are welcome!
By submitting a pull request, you agree that your contribution is licensed under the MIT License of this project (see LICENSE), and you confirm that you have the right to submit it under those terms.