You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Haopu Wang <HW...@qilinsoft.com> on 2015/03/26 03:28:16 UTC

[SparkSQL] How to calculate stddev on a DataFrame?

Hi,

 

I have a DataFrame object and I want to do types of aggregations like
count, sum, variance, stddev, etc.

 

DataFrame has DSL to do simple aggregations like count and sum.

 

How about variance and stddev?

 

Thank you for any suggestions!

 


Re: [SparkSQL] How to calculate stddev on a DataFrame?

Posted by Denny Lee <de...@gmail.com>.
Perhaps this email reference may be able to help from a DataFrame
perspective:
http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201503.mbox/%3CCALte62ztepahF=5hk9rcfBnyK4Z43wkcq4fKdCBWMgf_3_O36w@mail.gmail.com%3E


On Wed, Mar 25, 2015 at 7:29 PM Haopu Wang <HW...@qilinsoft.com> wrote:

>  Hi,
>
>
>
> I have a DataFrame object and I want to do types of aggregations like
> count, sum, variance, stddev, etc.
>
>
>
> DataFrame has DSL to do simple aggregations like count and sum.
>
>
>
> How about variance and stddev?
>
>
>
> Thank you for any suggestions!
>
>
>

Re: [SparkSQL] How to calculate stddev on a DataFrame?

Posted by Corey Nolet <cj...@gmail.com>.
I would do sum square. This would allow you to keep an ongoing value as an
associative operation (in an aggregator) and then calculate the variance &
std deviation after the fact.

On Wed, Mar 25, 2015 at 10:28 PM, Haopu Wang <HW...@qilinsoft.com> wrote:

>  Hi,
>
>
>
> I have a DataFrame object and I want to do types of aggregations like
> count, sum, variance, stddev, etc.
>
>
>
> DataFrame has DSL to do simple aggregations like count and sum.
>
>
>
> How about variance and stddev?
>
>
>
> Thank you for any suggestions!
>
>
>