Skip to content

glassflow/clickhouse-etl

GlassFlow Logo

Docs · Report Bug · Roadmap · Get Help · Watch Demo · Free Swag

Join Next Office Hour Email Support
Slack Twitter

GlassFlow for ClickHouse Streaming ETL

GlassFlow is an open-source ETL tool that enables real-time data processing from Kafka to ClickHouse. GlassFlow pipelines can perform the following operations:

  • Deduplicate: Remove duplicate records based on configurable keys and time windows - use when you need to ensure data uniqueness
  • Join: Perform temporal joins between multiple Kafka topics - use when combining related data streams with time-based matching
  • Deduplicate & Join: Combine both deduplication and joining in a single pipeline
  • Ingest only: Direct data transfer from Kafka to ClickHouse without transformations

⚡️ Quick Start

To get started with GlassFlow, you can:

  1. Try the Live Demo: Experience GlassFlow running on a live cluster at demo.glassflow.dev
  2. Install on Kubernetes: Follow our Kubernetes Installation Guide for production deployment
  3. Learn More: Explore our Usage Guide to start creating pipelines

🧭 Installation Options

GlassFlow is open source and can be self-hosted on Kubernetes. GlassFlow works with any managed Kubernetes services like AWS EKS, GKE, AKS, and more.

Method Use Case Docs Link
☸️ Kubernetes with Helm Production and development deployment Kubernetes Helm Guide

🎥 Demo

Live Preview

Log in and see a working demo of GlassFlow running on a GPC cluster at demo.glassflow.dev. You will see a Grafana dashboard and the setup that we used.

Demo Video

GlassFlow Overview Video

📚 Documentation

For detailed documentation, visit docs.glassflow.dev. The documentation includes:

🗺️ Roadmap

Check out our public roadmap to see what's coming next in GlassFlow. We're actively working on new features and improvements based on community feedback.

Want to suggest a feature? We'd love to hear from you! Please use our GitHub Discussions to share your ideas and help shape the future of GlassFlow.

✨ Features

  • Streaming deduplication and joins for up to 7d through an inbuilt state store
  • ClickHouse sink with a native protocol for high performance
  • Built-in Kafka connector with SASL, SSL, etc. for nearly all Kafka providers
  • Dead-Letter Queue for handling failed events
  • Field mapping of your Kafka table to ClickHouse
  • Prometheus metrics and OpenTelemetry logs for comprehensive observability

🆘 Support

⚖️ License

This project is licensed under the Apache License 2.0.

About

Real-time deduplication and temporal joins for streaming data

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors 9