GlassKit

Build smart AI apps for smart glasses, fast.

GlassKit is an open-source dev suite for building vision-enabled smart glasses apps. It provides SDKs and backends that turn real-time camera and microphone streams into specialized AI responses and actions, tailored to your workflow.

Today: this repository focuses on end-to-end examples you can adapt. Next: reusable SDKs + a production-ready backend are coming up.

https://glasskit.ai • https://x.com/GlassKit_ai • https://discord.gg/v5ayGKhPNP

Examples/Templates you can use

IKEA assembly assistant	Sushi speedrun HUD	Privacy filter
demo.webm	demo.webm	demo.mp4
Code ➡️ · Code (+ RF-DETR) ➡️ Real-time, vision-enabled voice assistant for Rokid Glasses. Streams mic + camera over WebRTC to the OpenAI Realtime API, plays back speech, and uses tool calls to guide tasks like IKEA assembly steps. The RF-DETR variant adds object detection and passes annotated frames to OpenAI for better visual understanding.	Code ➡️ Real-world speedrun HUD for Rokid Glasses. Streams video over WebRTC with a data channel to the backend, which runs a fine-tuned RF-DETR object detector for automatic, hands-free split completion based on a configured route.	Code ➡️ Real-time privacy filter that sits between the camera and app. Anonymizes faces without consent, detects and remembers verbal consent, and runs locally with recording support.

Why GlassKit

Smart glasses apps are hard.

Generic vision-capable LLMs often fail at real-world task support.
Each glasses brand has different hardware, form factors, and frameworks.
Real-time camera + mic streaming is non-trivial to build correctly and ergonomically.

GlassKit is built around:

Vision model orchestration: choose the right mix of multimodal LLMs and object detectors for the job.
Visual context management: define what the AI should know and how it is represented.
Real-time streaming: camera + mic in, responses out, with sane developer ergonomics.

How it works

You define your AI with visual/textual context and your business logic. Then your app works like this:

Camera frames and audio stream from the glasses to the backend via the SDK
The backend processes inputs using vision models and LLMs with your custom context + logic
Responses stream back to the glasses and the wearer via the SDK

You handle the app logic. GlassKit handles the glasses-to-AI pipeline.

Getting started

Pick an example from examples/
Open its README and follow the setup steps
Run it, then modify for your workflow

Status and roadmap

GlassKit is early and under active development, but the examples are usable today.

Current focus: end-to-end templates you can clone and adapt
Coming next: reusable SDKs + production-ready backends
Developer experience: demo video recording tooling; observability + debuggability tools
Platform support today: Rokid Glasses
Planned support: Meta glasses, Android XR, Mentra, and more

Contributing

Contributions are welcome!

By submitting a pull request, you agree that your contribution is licensed under the MIT License of this project (see LICENSE), and you confirm that you have the right to submit it under those terms.

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
archive		archive
examples		examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlassKit

Examples/Templates you can use

Why GlassKit

How it works

Getting started

Status and roadmap

Contributing

About

Uh oh!

Languages

License

RealComputer/GlassKit

Folders and files

Latest commit

History

Repository files navigation

GlassKit

Examples/Templates you can use

Why GlassKit

How it works

Getting started

Status and roadmap

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages