You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/20 02:02:11 UTC

[GitHub] [spark] bersprockets opened a new pull request, #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

bersprockets opened a new pull request, #38727:
URL: https://github.com/apache/spark/pull/38727

   ### What changes were proposed in this pull request?
   
   Change `TryToBinary`'s constructor to test whether the format expression is foldable before creating the replacement expression.
   
   ### Why are the changes needed?
   
   `try_to_binary`, when called with non-foldable format, throws an unhelpful error message:
   
   ```
   spark-sql> SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
   [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
   org.apache.spark.SparkException: [INTERNAL_ERROR] The Spark SQL phase analysis failed with an internal error. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace.
   	at org.apache.spark.SparkException$.internalError(SparkException.scala:88)
   ...
   Caused by: java.lang.AssertionError: assertion failed
   	at scala.Predef$.assert(Predef.scala:208)
   	at org.apache.spark.sql.catalyst.expressions.ToBinary.$anonfun$replacement$1(stringExpressions.scala:2597)
   	at scala.Option.map(Option.scala:230)
   	at org.apache.spark.sql.catalyst.expressions.ToBinary.replacement$lzycompute(stringExpressions.scala:2596)
   	at org.apache.spark.sql.catalyst.expressions.ToBinary.replacement(stringExpressions.scala:2596)
   ```
   `TryToBinary` creates an instance of `ToBinary` as a replacement expression. However, `ToBinary`'s default constructor does not check whether the format expression is foldable, so the code that creates `ToBinary`'s own replacement expression fails with an assertion failure.
   
   `ToBinary` is not typically instantiated using the default constructor. The function registry uses the second auxiliary constructor, which does check whether format is foldable. The code that creates `ToBinary`'s replacement expression assumes this second auxiliary constructor is always used.
   
   `TryToBinary` cannot use `ToBinary`'s second auxiliary constructor because it needs to set `nullOnInvalidFormat` to `true`.
   
   After this PR, the above example will throw a more useful error message:
   ```
   spark-sql> SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
   The 'format' parameter of function 'try_to_binary' needs to be a string literal.; line 1 pos 7
   spark-sql> 
   ```
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   New tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38727:
URL: https://github.com/apache/spark/pull/38727#discussion_r1027239674


##########
sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out:
##########
@@ -1597,3 +1597,20 @@ org.apache.spark.sql.AnalysisException
     "invalidValue" : "invalidformat"
   }
 }
+
+
+-- !query
+SELECT to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2)
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.AnalysisException
+The 'format' parameter of function 'to_binary' needs to be a string literal.; line 1 pos 7

Review Comment:
   When you move the exception from constructor, you should see `AnalysisException` w/ an error class.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala:
##########
@@ -227,7 +228,8 @@ case class TryToBinary(
 
   def this(expr: Expression, formatExpression: Expression) =
     this(expr, Some(formatExpression),
-      TryEval(ToBinary(expr, Some(formatExpression), nullOnInvalidFormat = true)))
+      TryEval(ToBinary(expr, Some(TryToBinary.checkFormat(formatExpression)),

Review Comment:
   Could you perform the check in `checkInputDataTypes()` as we do that in other expression, please.
   
   BTW, exceptions from expressions constructors are wrapped by `AnalysisException` additionally which is not convenient to users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

Posted by GitBox <gi...@apache.org>.
bersprockets commented on PR #38727:
URL: https://github.com/apache/spark/pull/38727#issuecomment-1322818455

   I will wait on PR #38737.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

Posted by GitBox <gi...@apache.org>.
bersprockets commented on PR #38727:
URL: https://github.com/apache/spark/pull/38727#issuecomment-1324453979

   Tested PR https://github.com/apache/spark/pull/38737. That PR incidentally seems to fix this issue:
   ```
   SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
   [DATATYPE_MISMATCH.NON_FOLDABLE_INPUT] Cannot resolve "to_binary(col1, col2)" due to data type mismatch: the input fmt should be a foldable "STRING" expression; however, got "col2".; line 1 pos 7;
   'Project [unresolvedalias(try_to_binary(col1#0, col2#1), None)]
   +- SubqueryAlias data
      +- LocalRelation [col1#0, col2#1]
   
   spark-sql> 
   ```
   Therefore, closing this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on a diff in pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

Posted by GitBox <gi...@apache.org>.
bersprockets commented on code in PR #38727:
URL: https://github.com/apache/spark/pull/38727#discussion_r1027172751


##########
sql/core/src/test/resources/sql-tests/inputs/try-string-functions.sql:
##########
@@ -45,4 +45,8 @@ select try_to_binary(null, null);
 select try_to_binary(null, cast(null as string));
 -- invalid format
 select try_to_binary('abc', 1);
-select try_to_binary('abc', 'invalidFormat');
\ No newline at end of file
+select try_to_binary('abc', 'invalidFormat');
+-- format must be foldable
+SELECT try_to_binary(col1, col2) from values ('abc', 'utf-8') as data(col1, col2);
+-- non-foldable input string
+SELECT try_to_binary(col1, 'utf-8') from values ('abc') as data(col1);

Review Comment:
   Not strictly relevant to this issue, but I noticed that there were no tests for non-foldable input values, so I added one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets closed pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`

Posted by GitBox <gi...@apache.org>.
bersprockets closed pull request #38727: [SPARK-41205][SQL] Check that format is foldable in `TryToBinary`
URL: https://github.com/apache/spark/pull/38727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org