You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/09/01 22:34:48 UTC

[GitHub] [spark] bersprockets opened a new pull request, #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

bersprockets opened a new pull request, #37763:
URL: https://github.com/apache/spark/pull/37763

   ### What changes were proposed in this pull request?
   
   Remove the check for foldable delimiter arguments from `StringToMap#checkInputDataTypes`.
   
   Except for `checkInputDataTypes`, `StringToMap` is already able to handle non-foldable delimiter arguments (no other changes are required).
   
   ### Why are the changes needed?
   
   Hive 2.3.9 allows non-foldable delimiter arguments, e.g.:
   ```
   drop table if exists maptbl;
   create table maptbl as select ',' as del1, ':' as del2, 'a:1,b:2,c:3' as str;
   insert into table maptbl select '%' as del1, '-' as del2, 'a-1%b-2%c-3' as str;
   select str, str_to_map(str, del1, del2) from maptbl;
   ```
   This returns
   ```
   +--------------+----------------------------+
   |     str      |            _c1             |
   +--------------+----------------------------+
   | a:1,b:2,c:3  | {"a":"1","b":"2","c":"3"}  |
   | a-1%b-2%c-3  | {"a":"1","b":"2","c":"3"}  |
   +--------------+----------------------------+
   2 rows selected (0.13 seconds)
   ```
   However, Spark returns an error:
   ```
   str_to_map's delimiters must be foldable.; line 1 pos 12;
   ```
   
   The use-case is more likely to be something like this:
   ```
   select
     str,
     str_to_map(str, ',', if(region = 0, ':', '#')) as m
   from
     maptbl2;
   
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, users can now specify non-foldable delimiter arguments to `str_to_map`.
   
   Literals are still accepted, so the change is backwardly compatible.
   
   ### How was this patch tested?
   
   New unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on a diff in pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

Posted by GitBox <gi...@apache.org>.
bersprockets commented on code in PR #37763:
URL: https://github.com/apache/spark/pull/37763#discussion_r961154793


##########
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala:
##########
@@ -606,6 +606,36 @@ class StringFunctionsSuite extends QueryTest with SharedSparkSession {
       Seq(Row(Map("a" -> "1", "b" -> "2", "c" -> "3")))
     )
 
+    checkAnswer(

Review Comment:
   Not really a test of my change, but it was a missing test (the case where a literal pair delimiter is given but no key/value delimiter is given).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #37763:
URL: https://github.com/apache/spark/pull/37763#issuecomment-1236191274

   +1, LGTM. Merging to master.
   Thank you, @bersprockets and @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #37763:
URL: https://github.com/apache/spark/pull/37763#discussion_r962107062


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:
##########
@@ -548,11 +548,7 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E
   override def dataType: DataType = MapType(StringType, StringType)
 
   override def checkInputDataTypes(): TypeCheckResult = {
-    if (Seq(pairDelim, keyValueDelim).exists(! _.foldable)) {
-      TypeCheckResult.TypeCheckFailure(s"$prettyName's delimiters must be foldable.")
-    } else {
-      super.checkInputDataTypes()
-    }
+    super.checkInputDataTypes()

Review Comment:
   If you invoke only the parent method, let's remove `checkInputDataTypes()` then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #37763:
URL: https://github.com/apache/spark/pull/37763#issuecomment-1235748593

   @cloud-fan I wonder why did you insist in https://github.com/apache/spark/pull/13990#discussion_r70455921 that the delimiters must be foldable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function
URL: https://github.com/apache/spark/pull/37763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org