Copy-on-write branching for Apache Lucene. Git-like snapshot and branch semantics on full-text search indices with structural sharing.
Built on Lucene 10.3.2. Forking a branch takes 3-5ms regardless of index size by sharing immutable segment files.
- Branch: A COW overlay directory sharing base segments with trunk. Each branch has its own commit history.
- Snapshot: An immutable reader at a specific commit generation. All commits are retained until explicit GC.
- Fork: Creates a new branch by copying segment metadata only (not data). Near-instant regardless of index size.
- GC: Explicit garbage collection of old snapshots, respecting branch references to shared segments.
| Layer | Namespace | Use Case |
|---|---|---|
| Java | org.replikativ.scriptum.BranchIndexWriter |
Direct Java usage |
| Core | scriptum.core |
Low-level Clojure wrapper |
| Yggdrasil | scriptum.yggdrasil |
High-level protocols |
For Clojure users: scriptum.yggdrasil for high-level API, scriptum.core for lower-level control.
For Java users: use BranchIndexWriter directly.
For Maven/Gradle:
<dependency>
<groupId>org.replikativ</groupId>
<artifactId>scriptum</artifactId>
<version>0.1.1</version>
</dependency>Java sources must be compiled before use:
clj -T:build compile-java(require '[scriptum.core :as sc])
;; Create an index
(def writer (sc/create-index "/tmp/my-index"))
;; Add documents
(sc/add-doc writer {:title {:type :text :value "Hello World"}
:id {:type :string :value "doc-1"}})
(sc/commit! writer "Initial commit")
;; Search
(sc/search writer {:match-all {}} 10)
;; => [{:title "Hello World", :id "doc-1", :score 1.0}]
;; Fork a branch
(def feature (sc/fork writer "experiment"))
;; Add to branch (doesn't affect main)
(sc/add-doc feature {:title {:type :text :value "Branch only"}
:id {:type :string :value "doc-2"}})
(sc/commit! feature "Added experimental doc")
;; Main still has 1 doc, branch has 2
(count (sc/search writer {:match-all {}} 100)) ;; => 1
(count (sc/search feature {:match-all {}} 100)) ;; => 2
;; Merge branch back
(sc/merge-from! writer feature)
(sc/commit! writer "Merged experiment")
;; Cleanup
(sc/close! feature)
(sc/close! writer)(sc/create-index path) ; create new index at path
(sc/open-branch path branch-name) ; open existing branch
(sc/fork writer "branch-name") ; fast fork from writer
(sc/close! writer) ; close writer and release resources
(sc/discover-branches path) ; => ["main" "feature" ...]
;; Accessors
(sc/num-docs writer) ; document count (excluding deletions)
(sc/max-doc writer) ; document count (including deletions)
(sc/branch-name writer) ; current branch name
(sc/base-path writer) ; index base path
(sc/main-branch? writer) ; true if this is the main branchField types: :text (analyzed, searchable), :string (exact match), :vector (float array for KNN).
(sc/add-doc writer {:title {:type :text :value "Searchable text"}
:tag {:type :string :value "exact-match"}
:embed {:type :vector :value (float-array [0.1 0.2 0.3])
:dims 3}})
(sc/delete-docs writer :id "doc-1") ; delete by field+value
(sc/update-doc writer :id "doc-1" new-fields) ; atomic delete+add(sc/commit! writer "commit message") ; persist changes
(sc/flush! writer) ; flush without new commit point
(sc/merge-from! writer source-writer) ; merge segments from another branch
(sc/list-snapshots writer)
;; => [{:generation 1 :uuid "..." :timestamp "..." :message "..." :branch "main"}
;; {:generation 2 :uuid "..." :timestamp "..." :message "..." :branch "main"}];; Term query
(sc/search writer {:term {:field "tag" :value "exact-match"}} 10)
;; Match-all
(sc/search writer {:match-all {}} 100)
;; Custom Lucene query object
(sc/search writer my-lucene-query 10)
;; Returns: [{:field1 "val" :field2 "val" :score 1.0} ...];; Get snapshot at specific generation
(def reader (sc/open-reader-at writer 1))
;; Check if a generation still exists (may be GC'd)
(sc/commit-available? writer 1) ; => true/false
;; Get current immutable snapshot
(def snap (sc/snapshot writer))
;; Execute with auto-closing snapshot
(sc/with-snapshot [reader writer]
(sc/search reader {:match-all {}} 10))
(.close reader);; Remove commits older than 1 hour, respecting branch references
(sc/gc! writer)GC only runs on the main branch and protects all segment files referenced by any branch.
For Java users, BranchIndexWriter provides the complete API:
import org.replikativ.scriptum.BranchIndexWriter;
import org.apache.lucene.document.*;
import java.nio.file.Path;
import java.time.Duration;
import java.time.Instant;
// Create an index
BranchIndexWriter main = BranchIndexWriter.create(Path.of("/tmp/my-index"), "main");
// Add documents
Document doc = new Document();
doc.add(new TextField("title", "Hello World", Field.Store.YES));
doc.add(new StringField("id", "doc-1", Field.Store.YES));
main.addDocument(doc);
main.commit("Initial commit");
// Fork a branch (3-5ms regardless of index size)
BranchIndexWriter feature = main.fork("experiment");
feature.addDocument(anotherDoc);
feature.commit("Feature work");
// Search
DirectoryReader reader = main.openReader();
IndexSearcher searcher = new IndexSearcher(reader);
TopDocs results = searcher.search(new MatchAllDocsQuery(), 10);
reader.close();
// Merge branch back
main.mergeFrom(feature);
// Time travel - open reader at specific generation
DirectoryReader historical = main.openReaderAt(1);
// Garbage collect old commits
main.gc(Instant.now().minus(Duration.ofHours(1)));
// Discover branches
Set<String> branches = BranchIndexWriter.discoverBranches(Path.of("/tmp/my-index"));
// Cleanup
feature.close();
main.close();| Method | Description |
|---|---|
create(path, branchName) |
Create new index |
open(path, branchName) |
Open existing branch |
fork(branchName) |
Fast fork (copies metadata only) |
addDocument(doc) |
Add a document |
deleteDocuments(terms...) |
Delete by terms |
updateDocument(term, doc) |
Atomic delete+add |
commit() / commit(message) |
Persist changes |
openReader() |
NRT reader (sees uncommitted) |
openCommittedReader() |
Reader on committed state |
openReaderAt(generation) |
Time travel to specific commit |
isCommitAvailable(generation) |
Check if commit still exists |
listSnapshots() |
Get all commit points |
mergeFrom(source) |
Merge another branch |
gc(beforeInstant) |
Garbage collect old commits |
numDocs() / maxDoc() |
Document counts |
getBranchName() |
Current branch name |
isMainBranch() |
Check if main branch |
Scriptum implements the Yggdrasil protocol stack (Snapshotable, Branchable, Graphable, Mergeable):
(require '[scriptum.yggdrasil :as sy]
'[yggdrasil.protocols :as p])
(def sys (sy/create "/tmp/my-index" {:system-name "search-index"}))
(p/branches sys) ; => #{:main}
(p/branch! sys :feature)
(p/checkout sys :feature)
;; ... add docs, commit ...
(p/merge! sys :main)
(p/history sys {:limit 10})
(sy/close! sys)Passes the full yggdrasil compliance test suite (22 tests, 203 assertions).
Typical results:
- Fork latency: 3-5ms (independent of index size)
- Indexing: ~50k docs/sec (text fields, SSD)
- Search: sub-millisecond for simple queries
On disk, scriptum uses this structure:
basePath/ -- trunk (main branch)
_0.cfs, _1.cfs, ... -- shared segment files
segments_N -- main's commit points
branches/
feature/ -- branch overlay
_10000.cfs, ... -- branch-specific segments
segments_N -- branch's commit points
Branches share base segments via read-only references. Only new writes create branch-specific segment files.
See docs/LUCENE_EXTENSION.md for a deep-dive into how Scriptum extends Lucene:
- How Lucene segments and commit points work
- BranchedDirectory: overlay pattern for COW reads/writes
- BranchDeletionPolicy: retaining all commits until explicit GC
- BranchAwareMergePolicy: preventing merge of shared segments
- Fork operation mechanics and performance analysis
- GC with branch protection
src/
clojure/scriptum/
core.clj # Low-level COW branching API
yggdrasil.clj # Yggdrasil protocol adapter
java/org/replikativ/scriptum/
BranchIndexWriter.java # Branch-aware Lucene writer (main Java API)
BranchedDirectory.java # COW directory overlay
BranchAwareMergePolicy.java # Prevents merging shared segments
BranchDeletionPolicy.java # Retains all commits until GC
docs/
LUCENE_EXTENSION.md # Technical deep-dive
test/scriptum/
core_test.clj # Unit tests
yggdrasil_test.clj # Compliance tests
- Java 21+
- Clojure 1.12.0+
- Apache Lucene 10.3.2 (pulled from Maven Central)
# Compile Java sources
clj -T:build compile-java
# Run tests
clj -T:build compile-java && clj -M:test
# Start nREPL
clj -T:build compile-java && clj -M:replCopyright (c) 2026 Christian Weilbach
Licensed under the Eclipse Public License 2.0.