You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/05/03 12:31:37 UTC

[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21228

    [SPARK-24171] Adding a note for non-deterministic functions

    ## What changes were proposed in this pull request?
    
    I propose to add a clear statement for functions like `collect_list()` about non-deterministic behavior of such functions. The behavior must be taken into account by user while creating and running queries.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 deterministic-comments

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21228
    
----
commit c1e5ade5cd0401519bb7b798741a66e88fd8504a
Author: Maxim Gekk <ma...@...>
Date:   2018-05-03T11:24:34Z

    Adding a note for non-deterministic functions

commit 077da4ecf12fe6b3375ed52e5cf4743ca942a3c6
Author: Maxim Gekk <ma...@...>
Date:   2018-05-03T11:47:13Z

    Updating comments for PySpark

commit 1761ed9cc189e3a4593090b5eaa488312f8e76b2
Author: Maxim Gekk <ma...@...>
Date:   2018-05-03T11:52:43Z

    Updating comments for R

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185783283
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -151,13 +151,15 @@ def _():
     
     _collect_list_doc = """
         Aggregate function: returns a list of objects with duplicates.
    +    The function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    I'd use `... note:`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90124/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185792840
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -3184,6 +3191,7 @@ setMethod("create_map",
     #' collect(select(df2, collect_list(df2$gear)))
     #' collect(select(df2, collect_set(df2$gear)))}
     #' @note collect_list since 2.3.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    for collect_list, collect_set maybe word it:
    "the function is non-deterministic, because the order of collected results depends on order of rows, which may be non-deterministic after a shuffle"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185792266
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -963,6 +964,7 @@ setMethod("kurtosis",
     #' last(df$c, TRUE)
     #' }
     #' @note last since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    Indeterministic, or usually non-deterministic. Shell we match the words while we are here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90268/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90248 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90248/testReport)** for PR 21228 at commit [`f92e586`](https://github.com/apache/spark/commit/f92e586243ac4505e17ad1343ed7e6920b0e92dc).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90258/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90258/testReport)** for PR 21228 at commit [`26e0a22`](https://github.com/apache/spark/commit/26e0a22c32d7d7f85d2e9ba6fc58c4d15f1babc0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185784205
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala ---
    @@ -94,9 +94,10 @@ object Rand {
     }
     
     /** Generate a random column with i.i.d. values drawn from the standard normal distribution. */
    -// scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.",
    +  usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed" +
    +    " (i.i.d.) values drawn from the standard normal distribution." +
    +    " The function is non-deterministic",
    --- End diff --
    
    dot at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186274561
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -152,13 +152,19 @@ def _():
     _collect_list_doc = """
         Aggregate function: returns a list of objects with duplicates.
     
    +    .. note:: The function is non-deterministic because the order of collected results depends
    +        on order of rows which may be non-deterministic after a shuffle.
    --- End diff --
    
    I feel that non-deterministic here is different with other non-deterministic like `monotonically_increasing_id` or `uuid`.
    
    Should we just say `The order of collected results is non-deterministic and depends on order of rows which may be non-deterministic after a shuffle`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90268/testReport)** for PR 21228 at commit [`0f86d8c`](https://github.com/apache/spark/commit/0f86d8c7e323452baf19614aea4efd0bbd914f7d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186260381
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -963,6 +964,7 @@ setMethod("kurtosis",
     #' last(df$c, TRUE)
     #' }
     #' @note last since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    I have seen few indeterministic in other functions... I thought matching it to non-deterministic


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185791719
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -963,6 +964,7 @@ setMethod("kurtosis",
     #' last(df$c, TRUE)
     #' }
     #' @note last since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    for the first/last maybe word it:
    "the function is non-deterministic, because its results depends on order of rows, which may be non-deterministic after a shuffle"



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90123/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185783395
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
    @@ -115,15 +115,15 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
       override def prettyName: String = "current_database"
     }
     
    -// scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.",
    +  usage = "_FUNC_() - Returns an universally unique identifier (UUID) string." +
    +    " The value is returned as a canonical UUID 36-character string." +
    +    " The function is non-deterministic.",
    --- End diff --
    
    I'd use 
    ```
    note = """
      blabla
    """
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90124/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90248/testReport)** for PR 21228 at commit [`f92e586`](https://github.com/apache/spark/commit/f92e586243ac4505e17ad1343ed7e6920b0e92dc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90258/testReport)** for PR 21228 at commit [`26e0a22`](https://github.com/apache/spark/commit/26e0a22c32d7d7f85d2e9ba6fc58c4d15f1babc0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186259641
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -963,6 +964,7 @@ setMethod("kurtosis",
     #' last(df$c, TRUE)
     #' }
     #' @note last since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    @HyukjinKwon I am using `non-deterministic` everywhere. I don't see big difference between `Indeterministic` and `non-deterministic`. Do you believe `Indeterministic` is better fit here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    @juliuszsompolski please, look at it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90249/testReport)** for PR 21228 at commit [`46b67d2`](https://github.com/apache/spark/commit/46b67d28d625597c6705357d36d234dbcfd08468).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90268/testReport)** for PR 21228 at commit [`0f86d8c`](https://github.com/apache/spark/commit/0f86d8c7e323452baf19614aea4efd0bbd914f7d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21228


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90123/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186276769
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala ---
    @@ -96,7 +96,8 @@ object Rand {
     /** Generate a random column with i.i.d. values drawn from the standard normal distribution. */
     // scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.",
    +  usage = """_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
    +    Note that the function is non-deterministic in general case.""",
    --- End diff --
    
    I mean to use a note like:
    
    https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186252477
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -818,6 +818,7 @@ setMethod("factorial",
     #' first(df$c, TRUE)
     #' }
     #' @note first(characterOrColumn) since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    yes, for now it is.
    in this case, and a "Note:..." after L806  "value it sees when na.rm is set to true. If..."


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90249/testReport)** for PR 21228 at commit [`46b67d2`](https://github.com/apache/spark/commit/46b67d28d625597c6705357d36d234dbcfd08468).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90124/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90248/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186252480
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -963,6 +964,7 @@ setMethod("kurtosis",
     #' last(df$c, TRUE)
     #' }
     #' @note last since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    same here about `@note`
    ditto in all other cases


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r186281087
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -152,13 +152,19 @@ def _():
     _collect_list_doc = """
         Aggregate function: returns a list of objects with duplicates.
     
    +    .. note:: The function is non-deterministic because the order of collected results depends
    +        on order of rows which may be non-deterministic after a shuffle.
    --- End diff --
    
    Nature of non-determinism can be different but I believe it is important to mention explicitly in the note that the function is `non-deterministic`. It would be common style of notes for all non-deterministic functions. Maybe it is not good example but event for `grep`ing or searching in docs it is pretty convenient. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185784148
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -561,6 +571,7 @@ object functions {
        * The function by default returns the last values it sees. It will return the last non-null
        * value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
        *
    +   * @note the function is non-deterministic
    --- End diff --
    
    dot at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    add to whitelist


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21228#discussion_r185783224
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -818,6 +818,7 @@ setMethod("factorial",
     #' first(df$c, TRUE)
     #' }
     #' @note first(characterOrColumn) since 1.4.0
    +#' @note the function is non-deterministic because its result depends on order of rows.
    --- End diff --
    
    I think `@note` is used for version specification in R but just use `Note:` or `Note that blabla`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90249/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21228
  
    **[Test build #90123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90123/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org