You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "jameslamb (via GitHub)" <gi...@apache.org> on 2023/07/07 16:27:28 UTC

[GitHub] [spark] jameslamb opened a new pull request, #41892: [MINOR][DOCS] clarify array_position return value

jameslamb opened a new pull request, #41892:
URL: https://github.com/apache/spark/pull/41892

   ### What changes were proposed in this pull request?
   
   Proposes a slight modification to the docs for SQL function `array_position()`, to clarify the return value and that function's behavior when no match is found.
   
   ### Why are the changes needed?
   
   In my opinion, the docs at https://spark.apache.org/docs/latest/api/sql/index.html#array_position leave too much room for confusion.
   
   > *array_position(array, element) - Returns the (1-based) index of the first element of the array as long.*
   > Examples:
   > ```
   > SELECT array_position(array(3, 2, 1), 1);
   >  3
   > ```
   
   Because the return value also happens to be a return value in the example array, and because the doc says "first element" instead of "first matching element", I think it'd be easy to misunderstand this function as doing something like "given `array_position(a, i)`, return `arr[i]`".
   
   The code comment on this function is very clear, but I think the user-facing docs would benefit from similar clarification:
   
   https://github.com/apache/spark/blob/05bc73a83921a9e606609a12750932f95bd5b3f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L2134-L2141
   
   This PR proposes modifying the doc to remove such confusion, by:
   
   * providing an example array and search values which are unlikely to be confused with indices
   * modifying the example such that it shows the behavior when `array_position(a, v)` is called on an array `a` containing multiple instances of `v`
   * adding an example showing the behavior of this function when no match is found
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, minor user-facing documentation change.
   
   ### How was this patch tested?
   
   I did not test this patch at all, just followed the patterns I saw elsewhere in the same file for documentation strings.
   
   I did check that this was the only place this text came from, like this:
   
   ```shell
   git grep 'index of the first element of the array as long'
   ```
   
   ### Notes for Reviewers
   
   Thanks very much for your time and consideration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jameslamb commented on pull request #41892: [MINOR][DOCS] clarify array_position return value

Posted by "jameslamb (via GitHub)" <gi...@apache.org>.
jameslamb commented on PR #41892:
URL: https://github.com/apache/spark/pull/41892#issuecomment-1626867181

   Ah sure, sorry about that, did not notice that test.
   
   I just pushed a commit which I hope will fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #41892: [MINOR][DOCS] clarify array_position return value

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #41892:
URL: https://github.com/apache/spark/pull/41892#issuecomment-1627944964

   thanks, merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #41892: [MINOR][DOCS] clarify array_position return value

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #41892:
URL: https://github.com/apache/spark/pull/41892#issuecomment-1626859259

   Mind fixing the test too?
   
   ```
   [info] ExpressionsSchemaSuite:
   [info] - Check schemas for expression examples *** FAILED *** (891 milliseconds)
   [info]   "...ray_position(array(3[, 2, 1), 1])" did not equal "...ray_position(array(3[12, 773, 708, 708), 708])" SQL query did not match (ExpressionsSchemaSuite.scala:183)
   [info]   Analysis:
   [info]   "...ray_position(array(3[, 2, 1), 1])" -> "...ray_position(array(3[12, 773, 708, 708), 708])"
   [info]   org.scalatest.exceptions.TestFailedException:
   [info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
   [info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
   [info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
   [info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
   [info]   at org.apache.spark.sql.ExpressionsSchemaSuite.$anonfun$new$12(ExpressionsSchemaSuite.scala:183)
   [info]   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   [info]   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   [info]   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   [info]   at org.apache.spark.sql.ExpressionsSchemaSuite.$anonfun$new$1(ExpressionsSchemaSuite.scala:182)
   [info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
   [info]   at org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #41892: [MINOR][DOCS] clarify array_position return value

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #41892: [MINOR][DOCS] clarify array_position return value
URL: https://github.com/apache/spark/pull/41892


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org