Skip to content

[native_datafusion] [Spark SQL Tests] Nested schema pruning tests fail — CometNativeScan not recognized as file source #3318

@andygrove

Description

@andygrove

Summary

~130 tests in SchemaPruningSuite (both "Spark vectorized reader" and "Non-vectorized reader" variants, with and without partition data columns) fail because CometNativeScan is not recognized as a file source scan node.

Error Pattern

All failures have the same error:

0 did not equal 1 Found 0 file sources in dataframe, but expected ArraySeq(struct<...>)

The test infrastructure looks for FileSourceScanExec or BatchScanExec nodes in the query plan to verify schema pruning. CometNativeScan is neither of these, so the tests find 0 file sources and fail.

Failing Tests

  • All SchemaPruningSuite tests including: select complex fields, nested field pruning, correlated subqueries, case-insensitive schema, generator output, Expand/Sort/Window, etc.
  • Case-insensitive parser variants from the same suite
  • SPARK-37450: Prunes unnecessary fields from Explode for count aggregation

Root Cause

CometNativeScan doesn't extend or isn't matched by the plan inspection utilities that look for file source scan nodes. The tests verify that schema pruning pushes the correct pruned schema down to the scan, but can't find the scan node to inspect.

This is both a test infrastructure issue (tests don't know about CometNativeScan) and potentially a functional concern (schema pruning may not be happening the same way in native_datafusion).

Related

Discovered in CI for #3307 (enable native_datafusion in auto scan mode).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions