Add ALP (Adaptive Lossless floating-Point) encoding support #3390

julienledem · 2026-01-22T16:52:52Z

Following Micah's suggestion yesterday, I took a stab at using Claude to produce a java implementation of ALP based on Prateek's spec and c++ implementation.
Bear in mind that I haven't closely reviewed it yet, it is fairly experimental but it seems promising.

Implements ALP encoding for FLOAT and DOUBLE types, which converts floating-point values to integers using decimal scaling, then applies Frame of Reference (FOR) encoding and bit-packing for compression.

New files:

AlpConstants.java: Constants for ALP encoding
AlpEncoderDecoder.java: Core encoding/decoding logic
AlpValuesWriter.java: Writer implementation
AlpValuesReaderForFloat/Double.java: Reader implementations

Includes comprehensive unit tests and interop test infrastructure.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Implements ALP encoding for FLOAT and DOUBLE types, which converts floating-point values to integers using decimal scaling, then applies Frame of Reference (FOR) encoding and bit-packing for compression. New files: - AlpConstants.java: Constants for ALP encoding - AlpEncoderDecoder.java: Core encoding/decoding logic - AlpValuesWriter.java: Writer implementation - AlpValuesReaderForFloat/Double.java: Reader implementations Includes comprehensive unit tests and interop test infrastructure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Restore original comment indentation that was accidentally changed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Escape <= characters as <= in javadoc comments to avoid malformed HTML errors during documentation generation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ALP encoding is not yet part of the parquet-format Thrift specification, so it cannot be converted to org.apache.parquet.format.Encoding. Skip it in the testEnumEquivalence test and add a clear error message in the converter for when ALP conversion is attempted. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

emkornfield · 2026-01-23T02:17:00Z

parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java

@@ -0,0 +1,136 @@
+/*


there was a recent comment on AI generated code on Arrow, if we aren't doing a lot of edits I'm not sure how it should be licensed.

emkornfield · 2026-01-23T02:17:30Z

parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java

+  // ========== Vector Constants ==========
+
+  /** Default number of elements per compressed vector (2^10 = 1024) */
+  public static final int ALP_VECTOR_SIZE = 1024;


nit: this is default but it should be configurable.

emkornfield · 2026-01-23T02:17:40Z

parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java

+  public static final int ALP_VECTOR_SIZE = 1024;
+
+  /** Log2 of the default vector size */
+  public static final int ALP_VECTOR_SIZE_LOG = 10;


same comment on configurable.

I think this is less important to make configurable then vector size as we are specifically calling it out as configurable in the spec we should make sure we generate some data at different bit-widths.

emkornfield · 2026-01-23T02:18:56Z

parquet-column/src/main/java/org/apache/parquet/column/values/alp/AlpConstants.java

+  // ========== Sampling Constants ==========
+
+  /** Number of values sampled per vector */
+  public static final int SAMPLER_SAMPLES_PER_VECTOR = 256;


wonder if these should be configurable somehow? Probably OK if not.

can these be package private?

emkornfield · 2026-01-23T02:22:54Z