Skip to content

kination/vine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vine - Datalake Format base on Rust (WIP)

Status: Work in Progress

This project aimes 'datalake table format' optimized for streaming data writes, built on Rust.

Quick Start

Build

./build.sh

This builds:

  • vine-core: Rust library for Vine
  • vine-spark: Spark DataSource V2 connector

Usage with Spark

// Write streaming data
spark.readStream
  .format("vine")
  .load("input-path")
  .writeStream
  .format("vine")
  .option("path", "/data/my-table")
  .start()

// Read with Spark SQL
val df = spark.read.format("vine").load("/data/my-table")
df.show()

Architecture

┌─────────────────────────────────────┐
│   Query Engines (Spark, Trino)     │
└──────────────┬──────────────────────┘
               │ DataSource API
┌──────────────▼──────────────────────┐
│  Connectors (vine-spark/vine-trino) │
└──────────────┬──────────────────────┘
               │ JNI
┌──────────────▼──────────────────────┐
│  Rust Core (vine-core)              │
│  - Fast Parquet writes              │
│  - Date-based partitioning          │
└──────────────┬──────────────────────┘
               │
┌──────────────▼──────────────────────┐
│  Storage (Parquet files)            │
│  2024-12-26/data_143025.parquet     │
│  2024-12-27/data_091500.parquet     │
└─────────────────────────────────────┘

Components

Component Language Status Purpose
vine-core Rust WIP Write-optimized datalake table format
vine-spark Scala WIP Spark DataSource V2 connector
vine-trino Java Planned Trino connector (not started)

Storage Format

  • Files: Apache Parquet (columnar)
  • Partitioning: Date-based directories (YYYY-MM-DD/data_HHMMSS.parquet)
  • Metadata: JSON schema file (vine_meta.json)
  • Types: integer, string, boolean, double

Documentation

Development

Build Components Individually

Rust Core

cd vine-core
cargo build --release
cargo test

Spark Connector

cd vine-spark
sbt clean assembly

Requirements

  • Rust 1.70+
  • Scala 2.13, sbt 1.x
  • Java 11

About

(PoC) Another datalake table format, for research

Resources

Stars

Watchers

Forks

Releases

No releases published