You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/06/17 17:05:05 UTC

[jira] [Updated] (SPARK-15660) Update RDD `variance/stdev` description and add popVariance/popStdev

     [ https://issues.apache.org/jira/browse/SPARK-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-15660:
----------------------------------
       Priority: Minor  (was: Major)
    Description: 
In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev instead of population ones. This PR updates the comments to prevent users from misunderstanding. This will update the following API docs.

- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter

Also, this PR adds them popVariance and popStdev functions clearly.

  was:
In Spark-11490, `variance/stdev` are redefined as the **sample** `variance/stdev` instead of population ones.

This issue addresses the only remaining legacy in RDD. This may cause breaking changes, but we had better be consistent in Spark 2.0 if possible.
{code}
scala> val ds = spark.createDataset(Seq(1.0, 2.0, 3.0))
ds: org.apache.spark.sql.Dataset[Double] = [value: double]

scala> ds.describe().show()
+-------+-----+                                                                 
|summary|value|
+-------+-----+
|  count|    3|
|   mean|  2.0|
| stddev|  1.0|
|    min|  1.0|
|    max|  3.0|
+-------+-----+

scala> ds.rdd.stdev
res1: Double = 0.816496580927726
{code}

     Issue Type: Improvement  (was: Bug)
        Summary: Update RDD `variance/stdev` description and add popVariance/popStdev  (was: RDD and Dataset should show the consistent value for variance/stdev.)

> Update RDD `variance/stdev` description and add popVariance/popStdev
> --------------------------------------------------------------------
>
>                 Key: SPARK-15660
>                 URL: https://issues.apache.org/jira/browse/SPARK-15660
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Dongjoon Hyun
>            Priority: Minor
>
> In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev instead of population ones. This PR updates the comments to prevent users from misunderstanding. This will update the following API docs.
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
> Also, this PR adds them popVariance and popStdev functions clearly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org