You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/06/17 17:05:05 UTC
[jira] [Updated] (SPARK-15660) Update RDD `variance/stdev`
description and add popVariance/popStdev
[ https://issues.apache.org/jira/browse/SPARK-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-15660:
----------------------------------
Priority: Minor (was: Major)
Description:
In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev instead of population ones. This PR updates the comments to prevent users from misunderstanding. This will update the following API docs.
- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
- http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
Also, this PR adds them popVariance and popStdev functions clearly.
was:
In Spark-11490, `variance/stdev` are redefined as the **sample** `variance/stdev` instead of population ones.
This issue addresses the only remaining legacy in RDD. This may cause breaking changes, but we had better be consistent in Spark 2.0 if possible.
{code}
scala> val ds = spark.createDataset(Seq(1.0, 2.0, 3.0))
ds: org.apache.spark.sql.Dataset[Double] = [value: double]
scala> ds.describe().show()
+-------+-----+
|summary|value|
+-------+-----+
| count| 3|
| mean| 2.0|
| stddev| 1.0|
| min| 1.0|
| max| 3.0|
+-------+-----+
scala> ds.rdd.stdev
res1: Double = 0.816496580927726
{code}
Issue Type: Improvement (was: Bug)
Summary: Update RDD `variance/stdev` description and add popVariance/popStdev (was: RDD and Dataset should show the consistent value for variance/stdev.)
> Update RDD `variance/stdev` description and add popVariance/popStdev
> --------------------------------------------------------------------
>
> Key: SPARK-15660
> URL: https://issues.apache.org/jira/browse/SPARK-15660
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Reporter: Dongjoon Hyun
> Priority: Minor
>
> In Spark-11490, `variance/stdev` are redefined as the sample variance/stdev instead of population ones. This PR updates the comments to prevent users from misunderstanding. This will update the following API docs.
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.api.java.JavaDoubleRDD
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.rdd.DoubleRDDFunctions
> - http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter
> Also, this PR adds them popVariance and popStdev functions clearly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org