Make your pandas processes flow with ease, your functions weaveable, and your data processing refineable. Create beautiful, readable, standardized, and visual data pipelines.
weaveflow is a Python library designed to bring clarity, structure, and visibility to your pandas data processing workflows. It transforms complex sequences of operations into a declarative, dependency-aware pipeline that is easy to read, maintain, and visualize.
Stop wrestling with tangled scripts and start weaving elegant data stories.
weaveflow introduces a few simple but powerful concepts to structure your data pipelines:
-
🧵 Weaving: Make your functions
weaveable. A@weavedecorator turns any Python function that operates on pandas Series into a node in a dependency graph. It automatically tracks inputs (from DataFrame columns) and outputs (to new DataFrame columns), building a clear feature engineering lineage. -
🔪 Refining: Make your data
refineable. A@refinedecorator marks classes or functions that perform larger, sequential transformations on the entire DataFrame, such as cleaning, filtering, dropping rows, or grouping. These steps form a clear, linear processing chain. -
🛢️ spooling: Externalize your parameters effortlessly. The
@spool_assetdecorator loads constants, configurations, and even small data files via customized engines (like CSVs) into dataclasses, making your pipeline's parameters transparent and easy to manage outside your code. -
🧶 Loom: The
Loomis the heart ofweaveflow. It's the orchestrator that takes your initial DataFrame and a list ofweaveableandrefineabletasks, and executes them in the correct order, managing all dependencies automatically. -
📊 Visualization:
weaveflowautomatically generates intuitive graphs of your pipeline.- The
WeaveGraphshows the dependency network of your feature engineering (@weave) steps. - The
RefineGraphshows the sequential flow of your data refinement (@refine) steps.
- The
- Declarative Pipelines: Define what you want to do, not how.
weaveflowhandles the execution order. - Automatic Dependency Graph: Understand at a glance how your features are derived. No more guessing which function created which column.
- Clear Separation of Concerns: A clean distinction between column-wise feature creation (
@weave) and table-wise transformations (@refine). - Effortless Parameterization: Decouple configuration from logic using
@spool_assetwith YAML, JSON, TOML, and even custom file types. - Stunning Visualizations: Generate
graphvizdiagrams of your entire workflow to share with your team, document your process, or debug complex flows. - Reproducibility: By structuring your code and externalizing parameters,
weaveflowpipelines are easier to reproduce and validate. - Code as Configuration: Your pipeline is defined by a simple list of functions and classes, making it self-documenting.
If you want to contribute to weaveflow or use the absolute latest, unreleased version, you should install it from a local clone of the repository. This project uses uv for high-performance package management and pygraphviz for graph visualization. Make sure these dependencies are installed before proceeding.
Install uv and pygraphviz:
# Install uv
pip install uv
# For Debian/Ubuntu
sudo apt-get update && sudo apt-get install -y graphviz
# For MacOS (using Homebrew)
brew install graphvizSetup your local development environment:
git clone https://github.com/kopib/weaveflow.git
cd weaveflow
uv pip install -e .Now you're ready to develop and test weaveflow locally.
To see weaveflow in action, run the quickstart.py script:
uv run quickstart.pyThis generates two beautiful graphs of the data pipelines:
Shows how your columns are created and what they depend on.
Shows the high-level, sequential stages of your data transformation.
This project is licensed under the MIT License.

