GraphGen Engine Refactor with Ray Data #115
ChenZiHong-Gavin
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We're thrilled to announce a groundbreaking refactor of GraphGen's pipeline engine, now powered by Ray Data for truly distributed, scalable data processing! This architectural evolution transforms GraphGen from a multi-threaded local engine into a production-ready distributed system that unifies heterogeneous resource management and enables high-performance streaming pipelines.
✨ What's New
1. Ray Data-Powered Execution Engine
2. New Operator Framework
cache/logs/OperatorName_workerID.log) for easier debugging in distributed environments.3. Distributed Storage Layer with Actor Isolation
4. LLM Serviceization for Resource Reuse
SYNTHESIZER_NUM_GPUSandTRAINEE_NUM_GPUSenvironment variables control resource assignment, with automatic actor placement on appropriate nodes.🏗️ Architecture Deep Dive
We've restructured GraphGen into a modular, maintainable architecture:
bases/(Core Abstractions)BaseOperator,BaseLLMWrapper,BaseStorageChunk,Node,Config,Communitymodels/(Atomic Capabilities)operators/(Ray-Ready Tasks)ReadOperator,ChunkOperator,BuildKGOperator,GenerateOperatorcommon/(Global Services)init_llm()andinit_storage()manage actor lifecyclesengine.py(DAG Orchestrator)Beta Was this translation helpful? Give feedback.
All reactions