Summary
~130 tests in SchemaPruningSuite (both "Spark vectorized reader" and "Non-vectorized reader" variants, with and without partition data columns) fail because CometNativeScan is not recognized as a file source scan node.
Error Pattern
All failures have the same error:
0 did not equal 1 Found 0 file sources in dataframe, but expected ArraySeq(struct<...>)
The test infrastructure looks for FileSourceScanExec or BatchScanExec nodes in the query plan to verify schema pruning. CometNativeScan is neither of these, so the tests find 0 file sources and fail.
Failing Tests
- All
SchemaPruningSuite tests including: select complex fields, nested field pruning, correlated subqueries, case-insensitive schema, generator output, Expand/Sort/Window, etc.
Case-insensitive parser variants from the same suite
SPARK-37450: Prunes unnecessary fields from Explode for count aggregation
Root Cause
CometNativeScan doesn't extend or isn't matched by the plan inspection utilities that look for file source scan nodes. The tests verify that schema pruning pushes the correct pruned schema down to the scan, but can't find the scan node to inspect.
This is both a test infrastructure issue (tests don't know about CometNativeScan) and potentially a functional concern (schema pruning may not be happening the same way in native_datafusion).
Related
Discovered in CI for #3307 (enable native_datafusion in auto scan mode).