Skip to content

Conversation

@tisonkun
Copy link
Member

@tisonkun tisonkun commented Jan 20, 2026

This refers to #37.

I plan to implement the following steps:

  1. (DONE) PairTable (for storing sparse data)
  2. (DONE) CpcSketch without union
    1. Empty state with empty data
    2. Sparse state with pair table
    3. Hybrid state with dense vector
    4. Pinned state with dense vector (with ICON estimator)
    5. Sliding state with dense vector
  3. (Done) Union
  4. Serde

Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun marked this pull request as draft January 20, 2026 15:14
Signed-off-by: tison <wander4096@gmail.com>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not implement the merge function as the Java/C++ impl for PairTable but find another way to do the two-way merge. This is because in Rust, it's impossible to hold a mutable ref when an immutable ref is still in used, which is how PairTable::merge is used in practice:

    PairTable.merge(srcPairArr, 0, srcNumPairs,
        allPairs, srcNumPairs, numPairsFromArray,
        allPairs, 0);  // note the overlapping subarray trick

The real effect here is to perform a two-way merge of allPairs[srcNumPairs..numPairsFromArray] and srcPairArr. There should be a more proper way to do this in Rust.

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
@tisonkun tisonkun marked this pull request as ready for review January 29, 2026 15:56
@tisonkun
Copy link
Member Author

tisonkun commented Jan 29, 2026

This PR is now ready for review.

It's mainly ported from the datasketches-cpp impl, so I tag @AlexanderSaydakov as a potential reviewer.

Union and serde (compression) would be implemented as follows. But the current state is a reviewable & mergeable minimal feature set.

// specific language governing permissions and limitations
// under the License.

#![allow(dead_code)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed when Union and Serde get implemented.

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
dbg_macro = "deny"

too_many_arguments = "allow"
needless_range_loop = "allow"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive when iterating over index can be more expressive.

Comment on lines +283 to +303
fn knuth_shell_sort3(a: &mut [u32]) {
let len = a.len();

let mut h = 0;
while h < len / 9 {
h = 3 * h + 1;
}

while h > 0 {
for i in h..len {
let v = a[i];
let mut j = i;
while j >= h && v < a[j - h] {
a[j] = a[j - h];
j -= h;
}
a[j] = v;
}
h /= 3;
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java uses std Arrays.sort here. We may use [T]::sort_stable (or unstable?) as well. But this is how C++ impl does.

Signed-off-by: tison <wander4096@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant