Scriptum

Copy-on-write branching for Apache Lucene. Git-like snapshot and branch semantics on full-text search indices with structural sharing.

Built on Lucene 10.3.2. Forking a branch takes 3-5ms regardless of index size by sharing immutable segment files.

Core Concepts

Branch: A COW overlay directory sharing base segments with trunk. Each branch has its own commit history.
Snapshot: An immutable reader at a specific commit generation. All commits are retained until explicit GC.
Fork: Creates a new branch by copying segment metadata only (not data). Near-instant regardless of index size.
GC: Explicit garbage collection of old snapshots, respecting branch references to shared segments.

API Layers

Layer	Namespace	Use Case
Java	`org.replikativ.scriptum.BranchIndexWriter`	Direct Java usage
Core	`scriptum.core`	Low-level Clojure wrapper
Yggdrasil	`scriptum.yggdrasil`	High-level protocols

For Clojure users: scriptum.yggdrasil for high-level API, scriptum.core for lower-level control.

For Java users: use BranchIndexWriter directly.

Getting Started

Dependencies

Add to deps.edn:

For Maven/Gradle:

<dependency>
  <groupId>org.replikativ</groupId>
  <artifactId>scriptum</artifactId>
  <version>0.1.1</version>
</dependency>

Build from Source

Java sources must be compiled before use:

clj -T:build compile-java

Quick Start (Clojure)

(require '[scriptum.core :as sc])

;; Create an index
(def writer (sc/create-index "/tmp/my-index"))

;; Add documents
(sc/add-doc writer {:title {:type :text :value "Hello World"}
                    :id    {:type :string :value "doc-1"}})
(sc/commit! writer "Initial commit")

;; Search
(sc/search writer {:match-all {}} 10)
;; => [{:title "Hello World", :id "doc-1", :score 1.0}]

;; Fork a branch
(def feature (sc/fork writer "experiment"))

;; Add to branch (doesn't affect main)
(sc/add-doc feature {:title {:type :text :value "Branch only"}
                     :id    {:type :string :value "doc-2"}})
(sc/commit! feature "Added experimental doc")

;; Main still has 1 doc, branch has 2
(count (sc/search writer {:match-all {}} 100))    ;; => 1
(count (sc/search feature {:match-all {}} 100))   ;; => 2

;; Merge branch back
(sc/merge-from! writer feature)
(sc/commit! writer "Merged experiment")

;; Cleanup
(sc/close! feature)
(sc/close! writer)

API Reference

Lifecycle

(sc/create-index path)              ; create new index at path
(sc/open-branch path branch-name)   ; open existing branch
(sc/fork writer "branch-name")      ; fast fork from writer
(sc/close! writer)                  ; close writer and release resources
(sc/discover-branches path)         ; => ["main" "feature" ...]

;; Accessors
(sc/num-docs writer)                ; document count (excluding deletions)
(sc/max-doc writer)                 ; document count (including deletions)
(sc/branch-name writer)             ; current branch name
(sc/base-path writer)               ; index base path
(sc/main-branch? writer)            ; true if this is the main branch

Document Operations

Field types: :text (analyzed, searchable), :string (exact match), :vector (float array for KNN).

(sc/add-doc writer {:title {:type :text :value "Searchable text"}
                    :tag   {:type :string :value "exact-match"}
                    :embed {:type :vector :value (float-array [0.1 0.2 0.3])
                            :dims 3}})

(sc/delete-docs writer :id "doc-1")           ; delete by field+value
(sc/update-doc writer :id "doc-1" new-fields) ; atomic delete+add

Commit & History

(sc/commit! writer "commit message")    ; persist changes
(sc/flush! writer)                      ; flush without new commit point
(sc/merge-from! writer source-writer)   ; merge segments from another branch

(sc/list-snapshots writer)
;; => [{:generation 1 :uuid "..." :timestamp "..." :message "..." :branch "main"}
;;     {:generation 2 :uuid "..." :timestamp "..." :message "..." :branch "main"}]

Search

;; Term query
(sc/search writer {:term {:field "tag" :value "exact-match"}} 10)

;; Match-all
(sc/search writer {:match-all {}} 100)

;; Custom Lucene query object
(sc/search writer my-lucene-query 10)

;; Returns: [{:field1 "val" :field2 "val" :score 1.0} ...]

Time Travel

;; Get snapshot at specific generation
(def reader (sc/open-reader-at writer 1))

;; Check if a generation still exists (may be GC'd)
(sc/commit-available? writer 1)  ; => true/false

;; Get current immutable snapshot
(def snap (sc/snapshot writer))

;; Execute with auto-closing snapshot
(sc/with-snapshot [reader writer]
  (sc/search reader {:match-all {}} 10))

(.close reader)

Garbage Collection

;; Remove commits older than 1 hour, respecting branch references
(sc/gc! writer)

GC only runs on the main branch and protects all segment files referenced by any branch.

Java API

For Java users, BranchIndexWriter provides the complete API:

import org.replikativ.scriptum.BranchIndexWriter;
import org.apache.lucene.document.*;
import java.nio.file.Path;
import java.time.Duration;
import java.time.Instant;

// Create an index
BranchIndexWriter main = BranchIndexWriter.create(Path.of("/tmp/my-index"), "main");

// Add documents
Document doc = new Document();
doc.add(new TextField("title", "Hello World", Field.Store.YES));
doc.add(new StringField("id", "doc-1", Field.Store.YES));
main.addDocument(doc);
main.commit("Initial commit");

// Fork a branch (3-5ms regardless of index size)
BranchIndexWriter feature = main.fork("experiment");
feature.addDocument(anotherDoc);
feature.commit("Feature work");

// Search
DirectoryReader reader = main.openReader();
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs results = searcher.search(new MatchAllDocsQuery(), 10);
reader.close();

// Merge branch back
main.mergeFrom(feature);

// Time travel - open reader at specific generation
DirectoryReader historical = main.openReaderAt(1);

// Garbage collect old commits
main.gc(Instant.now().minus(Duration.ofHours(1)));

// Discover branches
Set<String> branches = BranchIndexWriter.discoverBranches(Path.of("/tmp/my-index"));

// Cleanup
feature.close();
main.close();

Key Java Methods

Method	Description
`create(path, branchName)`	Create new index
`open(path, branchName)`	Open existing branch
`fork(branchName)`	Fast fork (copies metadata only)
`addDocument(doc)`	Add a document
`deleteDocuments(terms...)`	Delete by terms
`updateDocument(term, doc)`	Atomic delete+add
`commit()` / `commit(message)`	Persist changes
`openReader()`	NRT reader (sees uncommitted)
`openCommittedReader()`	Reader on committed state
`openReaderAt(generation)`	Time travel to specific commit
`isCommitAvailable(generation)`	Check if commit still exists
`listSnapshots()`	Get all commit points
`mergeFrom(source)`	Merge another branch
`gc(beforeInstant)`	Garbage collect old commits
`numDocs()` / `maxDoc()`	Document counts
`getBranchName()`	Current branch name
`isMainBranch()`	Check if main branch

Yggdrasil Integration

Scriptum implements the Yggdrasil protocol stack (Snapshotable, Branchable, Graphable, Mergeable):

(require '[scriptum.yggdrasil :as sy]
         '[yggdrasil.protocols :as p])

(def sys (sy/create "/tmp/my-index" {:system-name "search-index"}))

(p/branches sys)         ; => #{:main}
(p/branch! sys :feature)
(p/checkout sys :feature)
;; ... add docs, commit ...
(p/merge! sys :main)
(p/history sys {:limit 10})

(sy/close! sys)

Passes the full yggdrasil compliance test suite (22 tests, 203 assertions).

Performance

Typical results:

Fork latency: 3-5ms (independent of index size)
Indexing: ~50k docs/sec (text fields, SSD)
Search: sub-millisecond for simple queries

Directory Layout

On disk, scriptum uses this structure:

basePath/                    -- trunk (main branch)
  _0.cfs, _1.cfs, ...       -- shared segment files
  segments_N                 -- main's commit points
  branches/
    feature/                 -- branch overlay
      _10000.cfs, ...        -- branch-specific segments
      segments_N             -- branch's commit points

Branches share base segments via read-only references. Only new writes create branch-specific segment files.

Technical Documentation

See docs/LUCENE_EXTENSION.md for a deep-dive into how Scriptum extends Lucene:

How Lucene segments and commit points work
BranchedDirectory: overlay pattern for COW reads/writes
BranchDeletionPolicy: retaining all commits until explicit GC
BranchAwareMergePolicy: preventing merge of shared segments
Fork operation mechanics and performance analysis
GC with branch protection

Project Structure

src/
  clojure/scriptum/
    core.clj                 # Low-level COW branching API
    yggdrasil.clj            # Yggdrasil protocol adapter
  java/org/replikativ/scriptum/
    BranchIndexWriter.java   # Branch-aware Lucene writer (main Java API)
    BranchedDirectory.java   # COW directory overlay
    BranchAwareMergePolicy.java  # Prevents merging shared segments
    BranchDeletionPolicy.java    # Retains all commits until GC
docs/
  LUCENE_EXTENSION.md        # Technical deep-dive
test/scriptum/
  core_test.clj              # Unit tests
  yggdrasil_test.clj         # Compliance tests

Requirements

Java 21+
Clojure 1.12.0+
Apache Lucene 10.3.2 (pulled from Maven Central)

Development

# Compile Java sources
clj -T:build compile-java

# Run tests
clj -T:build compile-java && clj -M:test

# Start nREPL
clj -T:build compile-java && clj -M:repl

License

Licensed under the Eclipse Public License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.circleci		.circleci
docs		docs
src		src
test/scriptum		test/scriptum
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build.clj		build.clj
deps.edn		deps.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scriptum

Core Concepts

API Layers

Getting Started

Dependencies

Build from Source

Quick Start (Clojure)

API Reference

Lifecycle

Document Operations

Commit & History

Search

Time Travel

Garbage Collection

Java API

Key Java Methods

Yggdrasil Integration

Performance

Directory Layout

Technical Documentation

Project Structure

Requirements

Development

License

About

Uh oh!

Releases 1

Packages

Languages

License

replikativ/scriptum

Folders and files

Latest commit

History

Repository files navigation

Scriptum

Core Concepts

API Layers

Getting Started

Dependencies

Build from Source

Quick Start (Clojure)

API Reference

Lifecycle

Document Operations

Commit & History

Search

Time Travel

Garbage Collection

Java API

Key Java Methods

Yggdrasil Integration

Performance

Directory Layout

Technical Documentation

Project Structure

Requirements

Development

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages