Skip to content

hyperpolymath/formbd-analytics

Repository files navigation

FormBD-Analytics

OLAP analytics layer for FormBD - columnar aggregations and time-series analysis.

Overview

FormBD-Analytics provides high-performance analytical queries over FormBD documents. While FormBD prioritizes auditability and reversibility, analytics workloads require different optimization strategies. This separation allows FormBD to maintain its principles while enabling fast aggregations, rollups, and time-series analysis.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    FormBD (Source of Truth)                  │
│              Documents with PROMPT scores                    │
└─────────────────────────────────┬───────────────────────────┘
                                  │ HTTP API
                                  ▼
┌─────────────────────────────────────────────────────────────┐
│                    FormBD-Analytics                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Ingester   │  │   Columnar   │  │    Query     │       │
│  │              │──▶│   Store      │──▶│   Engine     │       │
│  │ (ETL from    │  │ (Arrow/      │  │ (DataFrames, │       │
│  │  FormBD)     │  │  Parquet)    │  │  Aggregates) │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
└─────────────────────────────────────────────────────────────┘
                                  │
                                  ▼ HTTP API
┌─────────────────────────────────────────────────────────────┐
│        Consumers (formbd-studio, dashboards, reports)        │
└─────────────────────────────────────────────────────────────┘

Key Features

  • Columnar Storage: Arrow/Parquet for analytical query patterns

  • PROMPT Score Analytics: Aggregations over epistemological dimensions

  • Time-Series Analysis: Document creation/modification trends

  • Rollups: Pre-computed aggregations for common queries

  • Provenance Tracking: Analyze who contributed what, when

Why Julia?

FormBD-Analytics uses Julia because:

  1. Native columnar operations: DataFrames.jl is optimized for analytical workloads

  2. Arrow integration: Arrow.jl provides zero-copy interop

  3. Performance: JIT compilation approaches C performance

  4. Scientific computing: Strong ecosystem for statistical analysis

  5. Hyperpolymath policy: Julia is the approved language for data/batch processing

API Endpoints

Analytics Queries

GET /analytics/health
    Health check

GET /analytics/stats
    Overall statistics about indexed data

POST /analytics/query
    Execute analytical query
    Body: { "query": "...", "params": {...} }

GET /analytics/prompt-scores?collection=X&groupBy=Y
    PROMPT score aggregations

GET /analytics/time-series?collection=X&field=Y&interval=day
    Time-series analysis

GET /analytics/contributors?collection=X
    Contributor/provenance analysis

Data Management

POST /analytics/sync
    Sync data from FormBD (incremental or full)
    Body: { "collection": "...", "mode": "incremental|full" }

GET /analytics/collections
    List synced collections with stats

Configuration

[formbd]
api_url = "http://localhost:8080"
collections = ["evidence", "claims"]

[server]
host = "127.0.0.1"
port = 8082

[storage]
# Path for Parquet files
data_dir = "./data"
# Retention in days (0 = forever)
retention_days = 0

[sync]
# Auto-sync interval in minutes (0 = manual only)
auto_sync_minutes = 60

PROMPT Score Analytics

FormBD documents may include PROMPT epistemological scores:

  • Provenance - Source traceability

  • Replicability - Can findings be reproduced?

  • Objective - Methodological rigor

  • Methodology - Analytical approach quality

  • Publication - Peer review status

  • Transparency - Data/method openness

FormBD-Analytics provides aggregations:

# Average PROMPT scores by collection
prompt_stats(collection="evidence", groupby=:source)

# Score distribution histograms
prompt_distribution(collection="evidence", dimension=:provenance)

# Correlation between dimensions
prompt_correlations(collection="evidence")

Development

Prerequisites

  • Julia 1.10+

  • FormBD instance running

Setup

cd formbd-analytics
julia --project=. -e 'using Pkg; Pkg.instantiate()'

Running

julia --project=. src/main.jl --config config.toml

Testing

julia --project=. -e 'using Pkg; Pkg.test()'

License

AGPL-3.0-or-later

Part of the FormBD ecosystem.

About

OLAP analytics layer for FormBD

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published