Skip to content
/ vine Public

(PoC) Another datalake table format, for research

Notifications You must be signed in to change notification settings

kination/vine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vine - Datalake Format base on Rust (WIP)

Status: Work in Progress

This project aimes 'datalake table format' optimized for streaming data writes, built on Rust.

Quick Start

Build

./build.sh

This builds:

  • vine-core: Rust library for Vine
  • vine-spark: Spark DataSource V2 connector

Usage with Spark

// Write streaming data
spark.readStream
  .format("vine")
  .load("input-path")
  .writeStream
  .format("vine")
  .option("path", "/data/my-table")
  .start()

// Read with Spark SQL
val df = spark.read.format("vine").load("/data/my-table")
df.show()

Architecture

┌─────────────────────────────────────┐
│   Query Engines (Spark, Trino)      │
└──────────────┬──────────────────────┘
               │ DataSource API
┌──────────────▼──────────────────────┐
│  Connectors (vine-spark/vine-trino) │
└──────────────┬──────────────────────┘
               │ JNI
┌──────────────▼──────────────────────┐
│  Rust Core (vine-core)              │
│  - Fast 'vortex' writes             │
│  - Date-based partitioning          │
└──────────────┬──────────────────────┘
               │
┌──────────────▼──────────────────────┐
│  Storage (vortex files)             │
│  2024-12-26/data_143025.vtx         │
│  2024-12-27/data_091500.vtx         │
└─────────────────────────────────────┘

Components

Component Language Status Purpose
vine-core Rust WIP Write-optimized datalake table format
vine-spark Scala WIP Spark DataSource V2 connector
vine-trino Java Planned Trino connector (not started)

Storage Format

  • Files: Vortex
  • Partitioning: Date-based directories (YYYY-MM-DD/data_HHMMSS.vtx)
  • Metadata: JSON schema file (vine_meta.json)
  • Types: integer, string, boolean, double

Documentation

Development

Build Components Individually

Rust Core

cd vine-core
cargo build --release
cargo test

Spark Connector

cd vine-spark
sbt clean assembly

Requirements

  • Rust 1.70+
  • Scala 2.13, sbt 1.x
  • Java 11

About

(PoC) Another datalake table format, for research

Resources

Stars

Watchers

Forks

Releases

No releases published