Skip to content

my-jstuff/clickhouse-etl

 
 

Repository files navigation

GlassFlow Logo

Docs · Report Bug · Roadmap · Get Help · Watch Demo

Join Next Office Hour Email Support
Slack Twitter

GlassFlow for ClickHouse Streaming ETL

GlassFlow is an open-source ETL tool that enables real-time data processing from Kafka to ClickHouse with features like deduplication and temporal joins.

⚡️ Quick Start

This guide walks you through a local installation using Docker Compose — perfect for development, testing, or trying out GlassFlow on your machine.

  1. Clone the repository:
git clone https://github.com/glassflow/clickhouse-etl.git
cd clickhouse-etl
  1. Start the services:
docker compose up
  1. Access the web interface at http://localhost:8080 to configure your pipeline.

  2. View the logs:

# Follow logs in real-time for all containers
docker compose logs -f

# logs for the backend api
docker compose logs api -f

# logs for the UI
docker compose logs ui -f

🧭 Installation Options

GlassFlow is open source and can be self-hosted on Kubernetes. GlassFlow works with any managed Kubernetes services like AWS EKS, GKE, AKS, and more. For local testing or a small POC, you can also use Docker and Docker Compose to run GlassFlow on your local machine.

Method Use Case Docs Link
☸️ Kubernetes with Helm Kubernetes deployment Kubernetes Helm Guide
🐳 Local with Docker Compose Quick evaluation and local testing Local Docker Guide
☁️ AWS EC2 with Docker Compose Lightweight cloud deployment for testing AWS EC2 Guide

🎥 Demo

Live Demo

See a working demo of GlassFlow in action at demo.glassflow.dev.

GlassFlow Pipeline Data Flow

GlassFlow Pipeline showing real-time streaming from Kafka through GlassFlow to ClickHouse

Demo Video

GlassFlow Overview Video

📚 Documentation

For detailed documentation, visit docs.glassflow.dev. The documentation includes:

🗺️ Roadmap

Check out our public roadmap to see what's coming next in GlassFlow. We're actively working on new features and improvements based on community feedback.

Want to suggest a feature? We'd love to hear from you! Please use our GitHub Discussions to share your ideas and help shape the future of GlassFlow.

✨ Features

  • Real-time data processing from Kafka to ClickHouse
  • Deduplication with configurable time windows
  • Temporal joins between multiple Kafka topics
  • Scalable and robust architecture built for Kubernetes
  • Web-based UI for pipeline management
  • Docker version for local testing and evaluation

🆘 Support

⚖️ License

This project is licensed under the Apache License 2.0.

About

Real-time deduplication and temporal joins for streaming data

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 65.7%
  • Go 25.7%
  • Gherkin 5.4%
  • CSS 2.4%
  • JavaScript 0.4%
  • Shell 0.2%
  • Other 0.2%