-
Notifications
You must be signed in to change notification settings - Fork 278
Feat: add support for elt expression
#3269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit 768b3e9.
| object CometElt extends CometScalarFunction[Elt]("elt") { | ||
|
|
||
| override def getSupportLevel(expr: Elt): SupportLevel = { | ||
| if (expr.failOnError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| object CometElt extends CometScalarFunction[Elt]("elt") { | ||
|
|
||
| override def getSupportLevel(expr: Elt): SupportLevel = { | ||
| if (expr.failOnError) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @kazantsev-maksim .Minor nitpick : We generally add ANSI mode not supported to make it clear to the user. I can work in enabling support for ANSI mode once this PR is merged as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
|
|
||
| override def getSupportLevel(expr: Elt): SupportLevel = { | ||
| if (expr.failOnError) { | ||
| return Unsupported(Some("failOnError=true is not supported")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
| sql(s"SELECT elt(cast(null as int), ${schema.fieldNames.mkString(",")}) FROM t1")) | ||
| checkSparkAnswerMaybeThrows(sql("SELECT elt(1) FROM t1")) match { | ||
| case (Some(spark), Some(comet)) => | ||
| assert(spark.getMessage.contains("WRONG_NUM_ARGS.WITHOUT_SUGGESTION")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : perhaps a good idea to capture the error msg "WRONG_NUM_ARGS.WITHOUT_SUGGESTION" in a string ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed
| StringExprConfig("trim", "select trim(c1) from parquetV1Table"), | ||
| StringExprConfig("upper", "select upper(c1) from parquetV1Table")) | ||
| StringExprConfig("upper", "select upper(c1) from parquetV1Table"), | ||
| StringExprConfig("elt", "select elt(2, c1, c1) from parquetV1Table")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for adding this in the benchmark setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Left minor comments
| override def getSupportLevel(expr: Elt): SupportLevel = { | ||
| if (expr.failOnError) { | ||
| return Unsupported(Some("failOnError=true is not supported")) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also might want to check if the inputs are a number and an array to make sure we terminate early in case the user provides malformed input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relevant examples : CometLength , CometRPad , CometLPad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first argument can be a numeric string - '2', it seems we can't cover all cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do a selective support for now (say Int, String) inputs for now and file an issue for further enhancements down the line. WDYT @kazantsev-maksim ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it. @coderfender Can you please see?
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3269 +/- ##
============================================
+ Coverage 56.12% 59.91% +3.78%
- Complexity 976 1462 +486
============================================
Files 119 175 +56
Lines 11743 16185 +4442
Branches 2251 2687 +436
============================================
+ Hits 6591 9697 +3106
- Misses 4012 5136 +1124
- Partials 1140 1352 +212 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# Conflicts: # native/core/src/execution/jni_api.rs
|
|
||
| private val WRONG_NUM_ARGS_WITHOUT_SUGGESTION_EXCEPTION_MSG = | ||
| "[WRONG_NUM_ARGS.WITHOUT_SUGGESTION] The `elt` requires > 1 parameters but the actual number is 1." | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : given that this message is only applicable for your expression test , may be we could move it inside the test itself ?
| .mkString(", ") | ||
| return Unsupported( | ||
| Some( | ||
| s"Parameters 2 and onwards require the string type, but contains: $unsupportedTypes")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : we could use StringType.toSQL to get SQL test for StringType to make it easier for user's to parse error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : might be a good idea to not do distinct here ..
Some of the options I could think of are
-
Exhaustive error message :
elt(1 , "apple", "banana", 10 , true , "StringType")will produceParameters 2 and onwards require the string type, but contains: StringType, IntegerType , BooleanTypeand could keep the user guessing . However, if we remove distinct and preserve order , it might beParameters 2 and onwards require the string type, but datatypes supplied are StringType, IntegerType , BooleanType , StringType -
Break at first non string type and return unsupported to reduce redundancy
But we could do that in a follow up PR by raising an issue on github too
coderfender
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left minor comments
| .mkString(", ") | ||
| return Unsupported( | ||
| Some( | ||
| s"Parameters 2 and onwards require the string type, but contains: $unsupportedTypes")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor : might be a good idea to not do distinct here ..
Some of the options I could think of are
-
Exhaustive error message :
elt(1 , "apple", "banana", 10 , true , "StringType")will produceParameters 2 and onwards require the string type, but contains: StringType, IntegerType , BooleanTypeand could keep the user guessing . However, if we remove distinct and preserve order , it might beParameters 2 and onwards require the string type, but datatypes supplied are StringType, IntegerType , BooleanType , StringType -
Break at first non string type and return unsupported to reduce redundancy
But we could do that in a follow up PR by raising an issue on github too
| assert(comet.getMessage.contains(WRONG_NUM_ARGS_WITHOUT_SUGGESTION_EXCEPTION_MSG)) | ||
| case (spark, comet) => | ||
| fail( | ||
| s"Expected Spark and Comet to throw exception, but got\nSpark: $spark\nComet: $comet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great if you can add more tests to cover other code paths
- Non string data types from pos 2
- ANSI mode should fall back successfully to Spark
- Index out of range and make sure we return NULL
- Verify that first arg being Int is being enforced properly (perhaps pass a string / long type input as a negative test)
| "idx", | ||
| lit(Random.shuffle(indexes ++ edgeCasesIndexes).headOption.getOrElse(-1))) | ||
| .createOrReplaceTempView("t1") | ||
| checkSparkAnswerAndOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be worth adding a literal test with constant folding disabled, something like:
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
checkSparkAnswerAndOperator("SELECT elt(2, 'a', 'b', 'c')") It is totally fine to fall back to Spark if all inputs are literal.
Which issue does this PR close?
N/A
Rationale for this change
What changes are included in this PR?
How are these changes tested?
Microbenchmark result: