Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 29, 2026

Summary

Closes #2611

  • Add a SQL-file-based test framework for expression testing with CometSqlFileTestSuite and SqlFileTestParser
  • Mark queries that expose native engine bugs with ignore directives linked to GitHub issues
  • Mark expected fallback cases with expect_fallback directives

How the SQL file test framework works

Overview

CometSqlFileTestSuite automatically discovers .sql test files under spark/src/test/resources/sql-tests/expressions/ and registers each one as a ScalaTest test case. This allows expression tests to be written as plain SQL files rather than Scala code.

SQL file format

Each .sql file contains a sequence of statements and queries, separated by blank lines:

-- Config: spark.comet.regexp.allowIncompatible=true
-- ConfigMatrix: parquet.enable.dictionary=false,true
-- MinSparkVersion: 3.5

statement
CREATE TABLE test(s string) USING parquet

statement
INSERT INTO test VALUES ('hello'), (NULL)

query
SELECT upper(s) FROM test

-- literal arguments
query tolerance=1e-6
SELECT ln(1.0), ln(2.718281828459045)

File-level directives (in comment headers):

  • -- Config: key=value — sets a Spark SQL config for all queries in the file
  • -- ConfigMatrix: key=val1,val2,... — runs the entire file once per value (generates combinatorial test cases)
  • -- MinSparkVersion: X.Y — skips the test on older Spark versions

Record types:

  • statement — executes DDL/DML (CREATE TABLE, INSERT, etc.) without checking results
  • query — executes a SELECT and compares Comet results against Spark, verifying both correctness and native execution coverage

Query assertion modes:

  • query — (default) checks that results match Spark AND that all operators ran natively in Comet
  • query spark_answer_only — checks results match Spark but does not verify native execution
  • query tolerance=1e-6 — checks results match within a floating-point tolerance
  • query expect_fallback(reason) — checks results match Spark AND verifies that Comet fell back to Spark with a reason string containing the given text
  • query ignore(reason) — skips the query entirely (used for known bugs, with a link to the GitHub issue)

Test runner (CometSqlFileTestSuite)

The test runner:

  1. Discovers all .sql files recursively under sql-tests/
  2. Parses each file using SqlFileTestParser
  3. Generates test case combinations from ConfigMatrix (e.g., parquet.enable.dictionary=false,true produces two test cases per file)
  4. For each test case, creates tables, runs statements, and validates queries using the specified assertion mode
  5. Disables Spark's ConstantFolding optimizer rule so that literal-only expressions are evaluated by Comet's native engine rather than being folded away at plan time
  6. Cleans up tables after each test

Test file organization

Test files are organized into category subdirectories:

sql-tests/expressions/
├── array/           (array_contains, flatten, size, ...)
├── bitwise/         (bitwise operations)
├── cast/            (type casting)
├── conditional/     (if, case_when, coalesce, predicates)
├── datetime/        (hour, date_add, trunc, ...)
├── decimal/         (decimal arithmetic)
├── hash/            (md5, sha1, sha2, hash, xxhash64)
├── map/             (get_map_value, map_from_arrays)
├── math/            (abs, sin, log, round, ...)
├── misc/            (width_bucket, scalar_subquery, ...)
├── string/          (substring, concat, like, ...)
└── struct/          (create_named_struct, json_to_structs, ...)

Native engine issues discovered

These issues were found by testing literal argument combinations with constant folding disabled:

Issue Title
#3336 Native engine panics on all-scalar inputs for hour, minute, second, unix_timestamp
#3337 Native engine panics on all-scalar inputs for Substring and StringSpace
#3338 Native engine panics with 'index out of bounds' on literal array expressions
#3339 Native engine crashes on concat_ws with literal NULL separator
#3340 Native engine crashes on literal sha2() with 'Unsupported argument types'
#3341 Native engine panics on scalar bit_count()
#3342 Native engine crashes on literal DateTrunc and TimestampTrunc
#3343 Native engine crashes on all-literal RLIKE expression
#3344 Native replace() returns wrong result for empty-string search
#3345 array_contains returns wrong result for literal array with NULL cast
#3346 array_contains returns null instead of false for empty array with literal value
#3326 space() with negative input causes native crash
#3327 map_from_arrays() with NULL inputs causes native crash
#3331 width_bucket fails: Int32 downcast to Int64Array
#3332 GetArrayItem returns incorrect results with dynamic index

andygrove and others added 2 commits January 29, 2026 07:50
Introduce a sqllogictest-inspired framework for writing Comet tests as
plain .sql files instead of Scala code. Convert 11 tests from
CometExpressionSuite to demonstrate the approach.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add ~115 SQL test files covering edge cases for every Comet-supported
expression, using the SQL file-based test framework. Tests cover NULL
inputs, boundary values, special values (NaN, Infinity, empty strings),
and type variety.

Adds two new query modes to the test framework:
- `query expect_fallback(reason)` - verifies fallback reason AND results
- `query ignore(reason)` - skips query (for known bugs with issue links)

Includes happy-path tests with configs enabled for incompatible
expressions (case conversion, regexp, date_format, from_unix_time,
init_cap).

Known bugs filed:
- apache#3326 (space(-1) crash)
- apache#3327 (map_from_arrays(NULL) crash)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.07%. Comparing base (f09f8af) to head (b79459f).
⚠️ Report is 906 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3328      +/-   ##
============================================
+ Coverage     56.12%   60.07%   +3.94%     
- Complexity      976     1477     +501     
============================================
  Files           119      175      +56     
  Lines         11743    16167    +4424     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9712    +3121     
- Misses         4012     5111    +1099     
- Partials       1140     1344     +204     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review January 29, 2026 16:27
@andygrove andygrove changed the title tests: Add ~115 SQL test files covering edge cases for every Comet-supported expression [WIP] tests: Add SQL test files covering edge cases for every Comet-supported expression [WIP] Jan 29, 2026
@mbutrovich
Copy link
Contributor

Are we able to use the standard .slt format or do we require additional features? There's an MIT-licensed parser from the Apache Calcite team here: https://github.com/hydromatic/sql-logic-test


- **`SqlFileTestParser.scala`** — Parses `.sql` test files into a structured `SqlTestFile` representation. Each file can contain:
- `-- Config: key=value` — Spark SQL configs to set for the entire file
- `-- ConfigMatrix: key=value1,value2` — Generates one test per combination of values (e.g., dictionary encoding on/off)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we can toggle ANSI mode by ConfigMatrix: spark.sql.ansi.enabled=true,false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, we can also (later) add config parameters when running the whole suite, maybe via maven profiles

@andygrove
Copy link
Member Author

andygrove commented Jan 29, 2026

Are we able to use the standard .slt format or do we require additional features? There's an MIT-licensed parser from the Apache Calcite team here: https://github.com/hydromatic/sql-logic-test

This is inspired by slt but there are some key differences.

  • We do not record expected results in the file. Instead, we compare to Spark (and we run for multiple Spark versions)
  • There are specific features to support minimum Spark version, and support settings Spark configs, including combinatorial matrices (so that we can migrate some existing tests to this approach)
  • It creates individual scala unit tests per file, fitting into the existing testing approach

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

@andygrove
Copy link
Member Author

andygrove commented Jan 29, 2026

Are we able to use the standard .slt format or do we require additional features? There's an MIT-licensed parser from the Apache Calcite team here: https://github.com/hydromatic/sql-logic-test

This is inspired by slt but there are some key differences.

* We do not record expected results in the file. Instead, we compare to Spark (and we run for multiple Spark versions)

* There are specific features to support minimum Spark version, and support settings Spark configs, including combinatorial matrices (so that we can migrate some existing tests to this approach)

* It creates individual scala unit tests per file, fitting into the existing testing approach

It is also worth pointing out that the test runner is ~300 LOC and extends CometTestBase, so this is tightly integrated with existing test infrastructure.

Example output:

CometSqlFileTestSuite:
26/01/29 10:52:24 INFO core/src/lib.rs: Comet native library version 0.14.0 initialized
- sql-file: expressions/isnan.sql [parquet.enable.dictionary=false] (4 seconds, 677 milliseconds)
- sql-file: expressions/isnan.sql [parquet.enable.dictionary=true] (315 milliseconds)
- sql-file: expressions/length.sql [parquet.enable.dictionary=false] (318 milliseconds)
- sql-file: expressions/length.sql [parquet.enable.dictionary=true] (244 milliseconds)
- sql-file: expressions/ends_with.sql [parquet.enable.dictionary=false] (256 milliseconds)
- sql-file: expressions/ends_with.sql [parquet.enable.dictionary=true] (200 milliseconds)

andygrove and others added 2 commits January 29, 2026 10:49
…expression tests

Add 8 new SQL test files for previously untested expressions (GetArrayItem,
GetStructField, GetMapValue, GetArrayStructFields, ArrayFilter, ScalarSubquery,
decimal ops, WidthBucket). Reorganize all 131 SQL test files into 13 category
subdirectories (math, string, array, map, struct, aggregate, datetime,
conditional, bitwise, hash, cast, decimal, misc). Add MinSparkVersion directive
support to the test framework for version-gated tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove force-pushed the sql-file-tests-all-expressions branch from de72ab5 to 53fc897 Compare January 29, 2026 17:49
@andygrove andygrove changed the title tests: Add SQL test files covering edge cases for every Comet-supported expression [WIP] tests: Add SQL test files covering edge cases for (almost) every Comet-supported expression [WIP] Jan 29, 2026
@andygrove andygrove changed the title tests: Add SQL test files covering edge cases for (almost) every Comet-supported expression [WIP] tests: Add SQL test files covering edge cases for (almost) every Comet-supported expression Jan 29, 2026
}

// Discover and register all .sql test files
discoverTestFiles(testResourceDir).foreach { file =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can parallelize the foreach and run tests in parallel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try this as follow on PR if that is ok.

}

/** Generate all config combinations from a ConfigMatrix specification. */
private def configCombinations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be good to name configMatrix ?

/** A SQL query whose results are compared between Spark and Comet. */
case class SqlQuery(sql: String, mode: QueryMode = CheckOperator) extends SqlTestRecord

sealed trait QueryMode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sealed trait QueryMode
sealed trait QueryAssertionMode

case class SqlQuery(sql: String, mode: QueryMode = CheckOperator) extends SqlTestRecord

sealed trait QueryMode
case object CheckOperator extends QueryMode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkCoverageAndAnswer

?

val records = Seq.newBuilder[SqlTestRecord]
val tables = Seq.newBuilder[String]

var i = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name i better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I addressed this and your other comments

…Matrix

Ignore get_array_item dynamic index test (apache#3332) and width_bucket tests
(apache#3331) with links to tracking issues. Rename configCombinations method
to configMatrix for clarity.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove force-pushed the sql-file-tests-all-expressions branch from caa7aeb to 9a711b4 Compare January 29, 2026 18:02
- Rename QueryMode to QueryAssertionMode
- Rename CheckOperator to CheckCoverageAndAnswer
- Rename loop variable i to lineIdx in SqlFileTestParser

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove
Copy link
Member Author

@shehabgamin you may be interested in this

-- specific language governing permissions and limitations
-- under the License.

-- ConfigMatrix: parquet.enable.dictionary=false,true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the tests run with dictionary encoding on and off, and this is because this is what we do in existing Scala-based tests, so I left it for now. However, this is pointless because most of these tests only insert a few rows. I will fix this in a future PR.

Add literal value test coverage for all expression test files to ensure
Comet's native engine handles literal arguments correctly, not just
column references. For N-argument functions, test all combinations of
literal/column arguments (col+lit, lit+col, lit+lit, etc.).

Disable Spark's ConstantFolding optimizer rule in the test runner so
literal-only expressions reach Comet's native engine rather than being
folded away at plan time.

Mark queries that expose native engine bugs with ignore directives
linked to filed GitHub issues:
- apache#3336: datetime scalar panics (hour, minute, second, unix_timestamp)
- apache#3337: Substring/StringSpace scalar panics
- apache#3338: array literal index out of bounds
- apache#3339: concat_ws NULL separator crash
- apache#3340: sha2 literal crash
- apache#3341: bit_count scalar panic
- apache#3342: DateTrunc/TimestampTrunc literal crash
- apache#3343: all-literal RLIKE crash
- apache#3344: replace() empty-string search wrong result
- apache#3345: array_contains literal result mismatch
- apache#3346: array_contains empty array returns null instead of false

Mark expected fallback cases with expect_fallback directives:
- lpad/rpad: scalar str argument not supported
- initcap: moved literal tests to init_cap_enabled.sql

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove
Copy link
Member Author

I will add ansi tests in a follow on PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EPIC] Improve Comet Correctness Testing for Expressions

5 participants