You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "watfordkcf (via GitHub)" <gi...@apache.org> on 2023/07/25 13:40:28 UTC

[GitHub] [spark] watfordkcf opened a new pull request, #42153: Update concat and concat_ws documentation to point out unexpected behavior

watfordkcf opened a new pull request, #42153:
URL: https://github.com/apache/spark/pull/42153

   Add documentation covering unexpected behavior of concat and concat_ws with respect to null values.
   
   ### What changes were proposed in this pull request?
   Adds additional documentation to `concat` and `concat_ws`.
   
   ### Why are the changes needed?
   The behavior of `concat` and `concat_ws` were unexpected w.r.t. null values and the documentation did not help make their behavior clear.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn closed pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn closed pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior
URL: https://github.com/apache/spark/pull/42153


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] watfordkcf commented on pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "watfordkcf (via GitHub)" <gi...@apache.org>.
watfordkcf commented on PR #42153:
URL: https://github.com/apache/spark/pull/42153#issuecomment-1655618473

   GitHub would only let me retrigger the whole suite unfortunately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #42153:
URL: https://github.com/apache/spark/pull/42153#discussion_r1275658166


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -53,13 +53,17 @@ import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`.",
+  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`, skipping null values.",

Review Comment:
   It looks like Pg got the wrong answer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #42153:
URL: https://github.com/apache/spark/pull/42153#issuecomment-1657393577

   thanks, merged to master/3.5/3/4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on a diff in pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #42153:
URL: https://github.com/apache/spark/pull/42153#discussion_r1274285285


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -53,13 +53,17 @@ import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`.",
+  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`, skipping null values.",

Review Comment:
   It's a null intolerant expression. Nulls are not skipped and the behavior is expected as null propagation. FYI, https://spark.apache.org/docs/latest/sql-ref-null-semantics.html#null-intolerant-expressions-.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #42153:
URL: https://github.com/apache/spark/pull/42153#issuecomment-1655223727

   Would you mind retriggering the failed GitHub action jobs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] watfordkcf commented on a diff in pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "watfordkcf (via GitHub)" <gi...@apache.org>.
watfordkcf commented on code in PR #42153:
URL: https://github.com/apache/spark/pull/42153#discussion_r1274915443


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -53,13 +53,17 @@ import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
  */
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`.",
+  usage = "_FUNC_(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by `sep`, skipping null values.",

Review Comment:
   Burying the lede makes it unexpected (similarly the documentation for 'comb' and 'tomb' doesn't include that they are pronounced differently), also that there is no symmetry between `concat` and `concat_ws` is equally unexpected:
   ```
   spark-sql (default)> SELECT concat('a', null, 'b'), concat_ws('', 'a', null, 'b');
   NULL	ab
   Time taken: 1.843 seconds, Fetched 1 row(s)
   ```
   
   Coming from postgres (and even SQL Server) further added to the confusion:
   ```
   -- SELECT concat('a', null, 'b'), concat_ws('', 'a', null, 'b')
    concat | concat_ws 
   --------+-----------
    ab     | ab
   (1 row)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yaooqinn commented on pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #42153:
URL: https://github.com/apache/spark/pull/42153#issuecomment-1652813575

   https://github.com/apache/spark/pull/42153/checks?check_run_id=15326884988 please follow the instructions here to enable the CI
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] watfordkcf commented on pull request #42153: [DOCS] Update concat and concat_ws documentation to point out unexpected behavior

Posted by "watfordkcf (via GitHub)" <gi...@apache.org>.
watfordkcf commented on PR #42153:
URL: https://github.com/apache/spark/pull/42153#issuecomment-1653546571

   It shows as being enabled on my end, but I guess GitHub wasn't ready when I pushed the commit, so I squashed and force pushed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org