You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2020/03/31 06:18:48 UTC
[spark] branch branch-2.4 updated: [SPARK-31306][DOCS] update
rand() function documentation to indicate exclusive upper bound
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new e226f68 [SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound
e226f68 is described below
commit e226f687c172c63ce9ae6531772af9df124c9454
Author: Ben Ryves <be...@getyourguide.com>
AuthorDate: Tue Mar 31 15:16:17 2020 +0900
[SPARK-31306][DOCS] update rand() function documentation to indicate exclusive upper bound
### What changes were proposed in this pull request?
A small documentation change to clarify that the `rand()` function produces values in `[0.0, 1.0)`.
### Why are the changes needed?
`rand()` uses `Rand()` - which generates values in [0, 1) ([documented here](https://github.com/apache/spark/blob/a1dbcd13a3eeaee50cc1a46e909f9478d6d55177/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala#L71)). The existing documentation suggests that 1.0 is a possible value returned by rand (i.e for a distribution written as `X ~ U(a, b)`, x can be a or b, so `U[0.0, 1.0]` suggests the value returned could include 1.0).
### Does this PR introduce any user-facing change?
Only documentation changes.
### How was this patch tested?
Documentation changes only.
Closes #28071 from Smeb/master.
Authored-by: Ben Ryves <be...@getyourguide.com>
Signed-off-by: HyukjinKwon <gu...@apache.org>
---
R/pkg/R/functions.R | 2 +-
python/pyspark/sql/functions.py | 2 +-
sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index e914dd3..09b0a21 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -2614,7 +2614,7 @@ setMethod("lpad", signature(x = "Column", len = "numeric", pad = "character"),
#' @details
#' \code{rand}: Generates a random column with independent and identically distributed (i.i.d.)
-#' samples from U[0.0, 1.0].
+#' samples uniformly distributed in [0.0, 1.0).
#' Note: the function is non-deterministic in general case.
#'
#' @rdname column_nonaggregate_functions
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index b964980..c305529 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -553,7 +553,7 @@ def nanvl(col1, col2):
@since(1.4)
def rand(seed=None):
"""Generates a random column with independent and identically distributed (i.i.d.) samples
- from U[0.0, 1.0].
+ uniformly distributed in [0.0, 1.0).
.. note:: The function is non-deterministic in general case.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index f419a38..21ad1fd 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -1224,7 +1224,7 @@ object functions {
/**
* Generate a random column with independent and identically distributed (i.i.d.) samples
- * from U[0.0, 1.0].
+ * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
@@ -1235,7 +1235,7 @@ object functions {
/**
* Generate a random column with independent and identically distributed (i.i.d.) samples
- * from U[0.0, 1.0].
+ * uniformly distributed in [0.0, 1.0).
*
* @note The function is non-deterministic in general case.
*
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org