You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/05/03 12:31:37 UTC
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21228
[SPARK-24171] Adding a note for non-deterministic functions
## What changes were proposed in this pull request?
I propose to add a clear statement for functions like `collect_list()` about non-deterministic behavior of such functions. The behavior must be taken into account by user while creating and running queries.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 deterministic-comments
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21228.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21228
----
commit c1e5ade5cd0401519bb7b798741a66e88fd8504a
Author: Maxim Gekk <ma...@...>
Date: 2018-05-03T11:24:34Z
Adding a note for non-deterministic functions
commit 077da4ecf12fe6b3375ed52e5cf4743ca942a3c6
Author: Maxim Gekk <ma...@...>
Date: 2018-05-03T11:47:13Z
Updating comments for PySpark
commit 1761ed9cc189e3a4593090b5eaa488312f8e76b2
Author: Maxim Gekk <ma...@...>
Date: 2018-05-03T11:52:43Z
Updating comments for R
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185783283
--- Diff: python/pyspark/sql/functions.py ---
@@ -151,13 +151,15 @@ def _():
_collect_list_doc = """
Aggregate function: returns a list of objects with duplicates.
+ The function is non-deterministic because its result depends on order of rows.
--- End diff --
I'd use `... note:`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90124/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185792840
--- Diff: R/pkg/R/functions.R ---
@@ -3184,6 +3191,7 @@ setMethod("create_map",
#' collect(select(df2, collect_list(df2$gear)))
#' collect(select(df2, collect_set(df2$gear)))}
#' @note collect_list since 2.3.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
for collect_list, collect_set maybe word it:
"the function is non-deterministic, because the order of collected results depends on order of rows, which may be non-deterministic after a shuffle"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185792266
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
Indeterministic, or usually non-deterministic. Shell we match the words while we are here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90268/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90248 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90248/testReport)** for PR 21228 at commit [`f92e586`](https://github.com/apache/spark/commit/f92e586243ac4505e17ad1343ed7e6920b0e92dc).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90258/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90258/testReport)** for PR 21228 at commit [`26e0a22`](https://github.com/apache/spark/commit/26e0a22c32d7d7f85d2e9ba6fc58c4d15f1babc0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185784205
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala ---
@@ -94,9 +94,10 @@ object Rand {
}
/** Generate a random column with i.i.d. values drawn from the standard normal distribution. */
-// scalastyle:off line.size.limit
@ExpressionDescription(
- usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.",
+ usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed" +
+ " (i.i.d.) values drawn from the standard normal distribution." +
+ " The function is non-deterministic",
--- End diff --
dot at the end.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186274561
--- Diff: python/pyspark/sql/functions.py ---
@@ -152,13 +152,19 @@ def _():
_collect_list_doc = """
Aggregate function: returns a list of objects with duplicates.
+ .. note:: The function is non-deterministic because the order of collected results depends
+ on order of rows which may be non-deterministic after a shuffle.
--- End diff --
I feel that non-deterministic here is different with other non-deterministic like `monotonically_increasing_id` or `uuid`.
Should we just say `The order of collected results is non-deterministic and depends on order of rows which may be non-deterministic after a shuffle`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90268/testReport)** for PR 21228 at commit [`0f86d8c`](https://github.com/apache/spark/commit/0f86d8c7e323452baf19614aea4efd0bbd914f7d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186260381
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
I have seen few indeterministic in other functions... I thought matching it to non-deterministic
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185791719
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
for the first/last maybe word it:
"the function is non-deterministic, because its results depends on order of rows, which may be non-deterministic after a shuffle"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90123/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185783395
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ---
@@ -115,15 +115,15 @@ case class CurrentDatabase() extends LeafExpression with Unevaluable {
override def prettyName: String = "current_database"
}
-// scalastyle:off line.size.limit
@ExpressionDescription(
- usage = "_FUNC_() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.",
+ usage = "_FUNC_() - Returns an universally unique identifier (UUID) string." +
+ " The value is returned as a canonical UUID 36-character string." +
+ " The function is non-deterministic.",
--- End diff --
I'd use
```
note = """
blabla
"""
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90124/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90248/testReport)** for PR 21228 at commit [`f92e586`](https://github.com/apache/spark/commit/f92e586243ac4505e17ad1343ed7e6920b0e92dc).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90258/testReport)** for PR 21228 at commit [`26e0a22`](https://github.com/apache/spark/commit/26e0a22c32d7d7f85d2e9ba6fc58c4d15f1babc0).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186259641
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
@HyukjinKwon I am using `non-deterministic` everywhere. I don't see big difference between `Indeterministic` and `non-deterministic`. Do you believe `Indeterministic` is better fit here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21228
@juliuszsompolski please, look at it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90249/testReport)** for PR 21228 at commit [`46b67d2`](https://github.com/apache/spark/commit/46b67d28d625597c6705357d36d234dbcfd08468).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90268/testReport)** for PR 21228 at commit [`0f86d8c`](https://github.com/apache/spark/commit/0f86d8c7e323452baf19614aea4efd0bbd914f7d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21228
ok to test
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21228
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90123/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186276769
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala ---
@@ -96,7 +96,8 @@ object Rand {
/** Generate a random column with i.i.d. values drawn from the standard normal distribution. */
// scalastyle:off line.size.limit
@ExpressionDescription(
- usage = "_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.",
+ usage = """_FUNC_([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
+ Note that the function is non-deterministic in general case.""",
--- End diff --
I mean to use a note like:
https://github.com/apache/spark/blob/2ce37b50fc01558f49ad22f89c8659f50544ffec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L101-L103
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186252477
--- Diff: R/pkg/R/functions.R ---
@@ -818,6 +818,7 @@ setMethod("factorial",
#' first(df$c, TRUE)
#' }
#' @note first(characterOrColumn) since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
yes, for now it is.
in this case, and a "Note:..." after L806 "value it sees when na.rm is set to true. If..."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90249/testReport)** for PR 21228 at commit [`46b67d2`](https://github.com/apache/spark/commit/46b67d28d625597c6705357d36d234dbcfd08468).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90124/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90248/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186252480
--- Diff: R/pkg/R/functions.R ---
@@ -963,6 +964,7 @@ setMethod("kurtosis",
#' last(df$c, TRUE)
#' }
#' @note last since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
same here about `@note`
ditto in all other cases
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r186281087
--- Diff: python/pyspark/sql/functions.py ---
@@ -152,13 +152,19 @@ def _():
_collect_list_doc = """
Aggregate function: returns a list of objects with duplicates.
+ .. note:: The function is non-deterministic because the order of collected results depends
+ on order of rows which may be non-deterministic after a shuffle.
--- End diff --
Nature of non-determinism can be different but I believe it is important to mention explicitly in the note that the function is `non-deterministic`. It would be common style of notes for all non-deterministic functions. Maybe it is not good example but event for `grep`ing or searching in docs it is pretty convenient.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185784148
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -561,6 +571,7 @@ object functions {
* The function by default returns the last values it sees. It will return the last non-null
* value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
*
+ * @note the function is non-deterministic
--- End diff --
dot at the end.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21228
add to whitelist
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #21228: [SPARK-24171] Adding a note for non-deterministic...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21228#discussion_r185783224
--- Diff: R/pkg/R/functions.R ---
@@ -818,6 +818,7 @@ setMethod("factorial",
#' first(df$c, TRUE)
#' }
#' @note first(characterOrColumn) since 1.4.0
+#' @note the function is non-deterministic because its result depends on order of rows.
--- End diff --
I think `@note` is used for version specification in R but just use `Note:` or `Note that blabla`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21228
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90249/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #21228: [SPARK-24171] Adding a note for non-deterministic functi...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21228
**[Test build #90123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90123/testReport)** for PR 21228 at commit [`1761ed9`](https://github.com/apache/spark/commit/1761ed9cc189e3a4593090b5eaa488312f8e76b2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org