-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add options to control hash join dynamic filter pushdown #19932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds two new configuration options to independently control hash join dynamic filter pushdown behavior: - `hash_join_map_pushdown` (default: true): Controls whether to push down hash table references for membership checks when InList thresholds are exceeded. When false, no membership filter is created if the build side is too large for InList pushdown. - `hash_join_bounds_pushdown` (default: true): Controls whether to push down min/max bounds for join key columns. When false, only membership filters (InList or Map) are pushed down. This enables flexible combinations: - InList only (no bounds, no map) - Bounds only (no inlist, no map) - Map only (no bounds) - InList + bounds (no map) - Any combination thereof The primary use case is allowing users to use InList + bounds without the map fallback, which can be beneficial for certain workloads where the hash table reference overhead outweighs its benefits. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds two new configuration options to control hash join dynamic filter pushdown behavior independently. The hash_join_map_pushdown config controls whether to use hash table references for membership checks when InList thresholds are exceeded, while hash_join_bounds_pushdown controls whether to push down min/max bounds for join key columns. This provides fine-grained control over the types of dynamic filters used in hash joins.
Changes:
- Added
hash_join_map_pushdownboolean config option (default: true) to control hash table reference pushdown - Added
hash_join_bounds_pushdownboolean config option (default: true) to control min/max bounds pushdown - Introduced
PushdownStrategy::Disabledenum variant to represent when map pushdown is disabled
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| datafusion/common/src/config.rs | Adds two new configuration options for controlling hash join dynamic filter pushdown behavior |
| datafusion/physical-plan/src/joins/hash_join/exec.rs | Reads the new config options and applies them when determining pushdown strategy and initializing SharedBuildAccumulator |
| datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs | Adds bounds_pushdown_enabled field to SharedBuildAccumulator, introduces Disabled pushdown strategy variant, and conditionally creates bounds predicates based on config |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// When true, pushes down hash table references for membership checks in hash joins | ||
| /// when the build side is too large for InList pushdown. | ||
| /// When false, no membership filter is created when InList thresholds are exceeded. |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states "When false, no membership filter is created when InList thresholds are exceeded." This could be clearer. Consider rephrasing to: "When false, no hash table reference (Map) is used for membership checks. If the build side exceeds InList thresholds, no membership filter is created (only bounds, if enabled)." This makes it clearer that InList filters can still be created when the build side is small enough, and that this config specifically controls the Map fallback behavior.
| /// When true, pushes down hash table references for membership checks in hash joins | |
| /// when the build side is too large for InList pushdown. | |
| /// When false, no membership filter is created when InList thresholds are exceeded. | |
| /// When true, pushes down hash table references (Map) for membership checks in hash joins | |
| /// when the build side is too large for InList pushdown. | |
| /// When false, no hash table reference (Map) is used for membership checks. If the build | |
| /// side exceeds InList thresholds, no membership filter is created (only bounds, if enabled). |
| pub hash_join_map_pushdown: bool, default = true | ||
|
|
||
| /// When true, pushes down min/max bounds for join key columns. | ||
| /// This enables statistics-based pruning (e.g., Parquet row group skipping). | ||
| /// When false, only membership filters (InList or Map) are pushed down. | ||
| /// Default: true | ||
| pub hash_join_bounds_pushdown: bool, default = true |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new configuration options hash_join_map_pushdown and hash_join_bounds_pushdown lack test coverage. While the PR description mentions existing hash join tests pass, there are no specific tests that validate the behavior when these options are set to false. Consider adding tests similar to the existing test_hashjoin_hash_table_pushdown_partitioned that verify:
- When
hash_join_map_pushdownis false and build side exceeds InList thresholds, no membership filter is created (only bounds if enabled) - When
hash_join_bounds_pushdownis false, bounds predicates are not included in the filter - When both are false but dynamic filter pushdown is enabled, appropriate behavior occurs
- Various combinations of these flags work correctly in both Partitioned and CollectLeft modes
Motivation
I wanted this to debug #19858 and thought this would be helpful. Opening a PR in case others think this would be useful. I do think we should be cautious of "config overload" so I'm not going to push to merge this unless others think they'd actually use this outside of the context of debugging.
Summary
hash_join_map_pushdownconfig option to control hash table reference pushdownhash_join_bounds_pushdownconfig option to control min/max bounds pushdownDetails
This PR adds two new configuration options to independently control hash join dynamic filter pushdown behavior:
hash_join_map_pushdown(default: true): Controls whether to push down hash table references for membership checks when InList thresholds are exceeded. When false, no membership filter is created if the build side is too large for InList pushdown.hash_join_bounds_pushdown(default: true): Controls whether to push down min/max bounds for join key columns. When false, only membership filters (InList or Map) are pushed down.This enables flexible combinations:
The primary use case is allowing users to use InList + bounds without the map fallback, which can be beneficial for certain workloads where the hash table reference overhead outweighs its benefits.
Test plan
🤖 Generated with Claude Code