You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shivaram Venkataraman (JIRA)" <ji...@apache.org> on 2015/04/10 04:09:12 UTC

[jira] [Commented] (SPARK-6841) Similar to `stats.py` in Python, add support for mean, median, stdev etc.

    [ https://issues.apache.org/jira/browse/SPARK-6841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14488757#comment-14488757 ] 

Shivaram Venkataraman commented on SPARK-6841:
----------------------------------------------

Comments from SparkR JIRA

[~shivaram] said:
I left this as an inline comment in the PR, but I think we can get most of the functionality we want by just using named lists as a StatCounter. This is mostly because we are doing the aggregation in R – If we were doing this in Java we could use jobj, but I think that might be overkill for this use case.
  
[~davies] also said: 
These API are introduced for only for RDD of numeric (when no DataFrame API) , and now I'm in favor of using DataFrame API for special types in RDD (DataFrame already know the type). Instead of putting these into RDD, could we implement these feature as DataFrame API?
 

> Similar to `stats.py` in Python, add support for mean, median, stdev etc.
> -------------------------------------------------------------------------
>
>                 Key: SPARK-6841
>                 URL: https://issues.apache.org/jira/browse/SPARK-6841
>             Project: Spark
>          Issue Type: New Feature
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>
> Similar to `stats.py` in Python, we should add support for mean, median, stdev etc. More specifically the functions we should support include
> 1. sum(rdd)
> 2. histogram(rdd, buckets)
> 3. mean(rdd)
> 4. variance(rdd)
> 5. stdev(rdd) 
> 6. sampleStdev(rdd)
> 7. sampleVariance(rdd)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org