Skip to content

Conversation

@Eyizoha
Copy link
Contributor

@Eyizoha Eyizoha commented Jan 15, 2026

Purpose

  • feat(third_party): Add checksum verification for downloaded dependencies
    • Avoid redundant downloads of dependency packages
  • fix(build): Set Protobuf_ROOT for ORC
    • Prevent ORC from potentially using system protoc during compilation, which could lead to mismatch between generated code and library
  • fix(build): Set the expected output path
    • Replace undefined path variables and unify artifact placement paths
  • feat(build): Add option to disable C++11 ABI
    • Facilitate integration for projects not using C++11 ABI
  • feat(build): Add build and package script
    • Provide a simple automated build and packaging script

Tests

None

API and Format

No impact on API or format

Documentation

No new features introduced

@lxy-9602 lxy-9602 requested a review from zjw1111 January 15, 2026 08:44
@zjw1111 zjw1111 requested a review from lucasfang January 16, 2026 02:22
Comment on lines 96 to 103
mkdir -p "$OUTPUT_DIR"
PACKAGE_DIR="$OUTPUT_DIR/$BUILD_NAME"
rm -rf "$PACKAGE_DIR"
mkdir -p "$PACKAGE_DIR/include" "$PACKAGE_DIR/lib"

cp -r include/* "$PACKAGE_DIR/include/"
cp -r "$BUILD_DIR"/$BUILD_TYPE/*.a "$PACKAGE_DIR/lib/"
cp -r "$BUILD_DIR"/$BUILD_TYPE/*.so "$PACKAGE_DIR/lib/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add -DCMAKE_INSTALL_PREFIX=$OUTPUT_DIR in CMAKE_OPTIONS, and run make install after make -j to collect all include files and all .a/.so to $OUTPUT_DIR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"-DZSTD_HOME=${ORC_ZSTD_ROOT}"
"-DZLIB_HOME=${ORC_ZLIB_ROOT}"
"-DPROTOBUF_HOME=${ORC_PROTOBUF_ROOT}"
"-DProtobuf_ROOT=${ORC_PROTOBUF_ROOT}"
Copy link
Collaborator

@zjw1111 zjw1111 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's already right with -DPROTOBUF_HOME. See orc's code, all PROTOBUF_HOMEPROTOBUF_ROOTProtobuf_ROOT are linked to PROTOBUF_HOME.

Is there any bad case in your development environment, which set -DPROTOBUF_HOME is not ok, but set -DProtobuf_ROOT is ok?

Image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in my development environment, the system has a higher incompatible version of protobuf installed (3.9.1). During the build process, ORC unexpectedly uses the system's protoc to generate code, but at the linking stage, it uses the project-specified version of libprotobuf, which leads to undefined reference errors.

Based on my analysis, the issue is that find_package(Protobuf CONFIG) in ORC's FindProtobuf.cmake will prioritize finding the system's protobuf and use the system's protoc. PROTOBUF_HOME is not among CMake's standard search variables, so if only PROTOBUF_HOME is set, find_package() will ignore it and search in system paths instead.

After I set Protobuf_ROOT, the problem was resolved. This issue should be reproducible if you install a higher version of protobuf on the system (e.g., 3.9.1).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the build system's reliability and flexibility by adding checksum verification for downloaded dependencies, fixing build configuration issues, and providing a convenient build/package script.

Changes:

  • Added checksum verification with automatic skip of already-downloaded dependencies to avoid redundant downloads
  • Fixed build configuration by setting Protobuf_ROOT for ORC, replacing undefined output path variables with proper CMAKE_* variables, and adding C++11 ABI control
  • Added an automated build and packaging script for simplified workflows

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
third_party/versions.txt Added package name variables for each dependency to support checksum verification and local file detection
third_party/download_dependencies.sh Implemented SHA256 checksum verification with multi-platform support and automatic skip logic for existing valid files
cmake_modules/ThirdpartyToolchain.cmake Added THIRDPARTY_DIR variable, local file fallback logic, Protobuf_ROOT for ORC, and C++11 ABI flag propagation to external projects
cmake_modules/BuildUtils.cmake Replaced undefined OUTPUT_PATH variables with proper CMAKE_*_OUTPUT_DIRECTORY variables for consistent artifact placement
build_and_package.sh New script providing automated build and packaging workflow with configurable options
CMakeLists.txt Added PAIMON_USE_CXX11_ABI option with validation preventing Lance/Lumina from being enabled when ABI is off, and added CMAKE_RUNTIME_OUTPUT_DIRECTORY definition
.gitignore Added output directory to ignore list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fi
mkdir -p "$BUILD_DIR"
cd "$BUILD_DIR"
cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX="$PACKAGE_DIR" $CMAKE_OPTIONS ..
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CMAKE_OPTIONS variable should be quoted when passed to cmake to properly handle options with spaces or special characters. Without quotes, word splitting could cause issues.

Copilot uses AI. Check for mistakes.
mkdir -p "$BUILD_DIR"
cd "$BUILD_DIR"
cmake -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_INSTALL_PREFIX="$PACKAGE_DIR" $CMAKE_OPTIONS ..
make -j"$(nproc 2>/dev/null || echo 4)"
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses 'nproc' without quoting in the make command. While the fallback to 4 is provided, it would be more robust to quote the entire command substitution to avoid potential issues.

Suggested change
make -j"$(nproc 2>/dev/null || echo 4)"
JOBS=$(nproc 2>/dev/null || echo 4)
make -j"$JOBS"

Copilot uses AI. Check for mistakes.

echo "Downloading ${url} to ${out}..."
wget --continue --output-document="${out}" "${url}" || \
(echo "Failed to download ${url}" 1>&2; exit 1)
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message says "Failed to download" but should be "Failed downloading" to match the original error message style and maintain consistency with the rest of the codebase.

Suggested change
(echo "Failed to download ${url}" 1>&2; exit 1)
(echo "Failed downloading ${url}" 1>&2; exit 1)

Copilot uses AI. Check for mistakes.
fi

echo "Downloading ${url} to ${out}..."
wget --continue --output-document="${out}" "${url}" || \
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --quiet flag was removed from wget. This will make the download output more verbose. Consider whether this was intentional or if the --quiet flag should be retained for cleaner build logs.

Suggested change
wget --continue --output-document="${out}" "${url}" || \
wget --quiet --continue --output-document="${out}" "${url}" || \

Copilot uses AI. Check for mistakes.
Comment on lines 38 to 42
BUILD_TYPE="release"
shift
;;
-d|--debug)
BUILD_TYPE="debug"
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BUILD_TYPE variable should use proper case ('Release' or 'Debug') instead of lowercase ('release' or 'debug') to follow CMake conventions and ensure consistent behavior across all platforms and generators.

Copilot uses AI. Check for mistakes.
openssl dgst -sha256 "${file}" | cut -d' ' -f2
else
# sha256sum and shasum have similar output format
${checksum_cmd} "${file}" | cut -d' ' -f1
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unquoted expansion of checksum_cmd relies on word splitting to work correctly. While this works in this specific context, it's fragile and could break if the command path contains spaces. Consider using an array or restructuring the code to call commands directly for better robustness.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,129 @@
#!/usr/bin/env bash
#
# Copyright 2024-present Alibaba Inc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026-present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


PAIMON_ZLIB_BUILD_VERSION=1.3.1
PAIMON_ZLIB_BUILD_SHA256_CHECKSUM=9a93b2b7dfdac77ceba5a558a580e74667dd6fede4585b91eefb60f03b72df23
PAIMON_ZLIB_PKG_NAME=zlib-1.3.1.tar.gz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce redundancy, consider deriving *_PKG_NAME from *_BUILD_VERSION (e.g., zlib- $ {PAIMON_ZLIB_BUILD_VERSION}.tar.gz) instead of hardcoding the version twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants