You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Matt Pestritto <ma...@pestritto.com> on 2009/05/26 22:02:11 UTC

Built - In Aggregate Function - Standard Deviation

Hi.

Are there plans to write a standard deviation aggregate function ?  I had to
build my own which translated into multiple hive queries.  While it works, a
build-in function would have been much easier.

Thanks
-Matt

Re: Built - In Aggregate Function - Standard Deviation

Posted by Zheng Shao <zs...@gmail.com>.
Good catch. I just added that to wiki.

Thanks Amr.

Zheng

On Sun, May 31, 2009 at 12:04 AM, Amr Awadallah <aa...@cloudera.com> wrote:

>  Zheng,
>
>   is SQRT an undocumented builtin UDF? I couldn't see it in the lang manual
> at:
>
> http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF
>
> hence why I did pow(x,0.5) instead
>
> -- amr
>
> Hive does not support that right now, but Hive supports subquery so we can
> do:
>
> SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) )
> FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS totalsqr
> FROM t) t2;
>
> Zheng
>
> On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <ma...@pestritto.com>wrote:
>
>> Ah - Thanks so much. I didn't know you could reference a column that has
>> an aggregate function call again in the same select statement.  This is much
>> cleaner that the approach that I took.  I'll give it a shot.
>>
>> Thanks again.
>>
>> On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <aa...@cloudera.com> wrote:
>>
>>> I agree that a builtin for std dev is a good idea.
>>>
>>> that said, you can achieve this easy in one pass, just use:
>>>
>>> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n,
>>> pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev
>>> from ....
>>>
>>> Matt Pestritto wrote:
>>>
>>>> Hi.
>>>>
>>>> Are there plans to write a standard deviation aggregate function ?  I
>>>> had to build my own which translated into multiple hive queries.  While it
>>>> works, a build-in function would have been much easier.
>>>>
>>>> Thanks
>>>> -Matt
>>>>
>>>
>>
>
>
> --
> Yours,
> Zheng
>
>
>


-- 
Yours,
Zheng

Re: Built - In Aggregate Function - Standard Deviation

Posted by Amr Awadallah <aa...@cloudera.com>.
Zheng,

  is SQRT an undocumented builtin UDF? I couldn't see it in the lang 
manual at:

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF

hence why I did pow(x,0.5) instead

-- amr
> Hive does not support that right now, but Hive supports subquery so we 
> can do:
>
> SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) )
> FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS 
> totalsqr FROM t) t2;
>
> Zheng
>
> On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <matt@pestritto.com 
> <ma...@pestritto.com>> wrote:
>
>     Ah - Thanks so much. I didn't know you could reference a column
>     that has an aggregate function call again in the same select
>     statement.  This is much cleaner that the approach that I took. 
>     I'll give it a shot.
>
>     Thanks again.
>
>
>     On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <aaa@cloudera.com
>     <ma...@cloudera.com>> wrote:
>
>         I agree that a builtin for std dev is a good idea.
>
>         that said, you can achieve this easy in one pass, just use:
>
>         select sum( pow(col,2) ) as totsqr, sum( col ) as tot,
>         count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5)
>         as stddev
>         from ....
>
>
>         Matt Pestritto wrote:
>
>             Hi.
>
>             Are there plans to write a standard deviation aggregate
>             function ?  I had to build my own which translated into
>             multiple hive queries.  While it works, a build-in
>             function would have been much easier.
>
>             Thanks
>             -Matt
>
>
>
>
>
> -- 
> Yours,
> Zheng


Re: Built - In Aggregate Function - Standard Deviation

Posted by Zheng Shao <zs...@gmail.com>.
Hive does not support that right now, but Hive supports subquery so we can
do:

SELECT SQRT( (n*totalsqr - tot * tot) / (n*(n-1)) )
FROM (SELECT COUNT(1) as n, SUM(col) as total, SUM( col*col ) AS totalsqr
FROM t) t2;

Zheng

On Sat, May 30, 2009 at 4:08 PM, Matt Pestritto <ma...@pestritto.com> wrote:

> Ah - Thanks so much. I didn't know you could reference a column that has an
> aggregate function call again in the same select statement.  This is much
> cleaner that the approach that I took.  I'll give it a shot.
>
> Thanks again.
>
>
> On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <aa...@cloudera.com> wrote:
>
>> I agree that a builtin for std dev is a good idea.
>>
>> that said, you can achieve this easy in one pass, just use:
>>
>> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow(
>> (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev
>> from ....
>>
>>
>> Matt Pestritto wrote:
>>
>>> Hi.
>>>
>>> Are there plans to write a standard deviation aggregate function ?  I had
>>> to build my own which translated into multiple hive queries.  While it
>>> works, a build-in function would have been much easier.
>>>
>>> Thanks
>>> -Matt
>>>
>>
>


-- 
Yours,
Zheng

Re: Built - In Aggregate Function - Standard Deviation

Posted by Matt Pestritto <ma...@pestritto.com>.
Ah - Thanks so much. I didn't know you could reference a column that has an
aggregate function call again in the same select statement.  This is much
cleaner that the approach that I took.  I'll give it a shot.

Thanks again.

On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah <aa...@cloudera.com> wrote:

> I agree that a builtin for std dev is a good idea.
>
> that said, you can achieve this easy in one pass, just use:
>
> select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow(
> (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev
> from ....
>
>
> Matt Pestritto wrote:
>
>> Hi.
>>
>> Are there plans to write a standard deviation aggregate function ?  I had
>> to build my own which translated into multiple hive queries.  While it
>> works, a build-in function would have been much easier.
>>
>> Thanks
>> -Matt
>>
>

Re: Built - In Aggregate Function - Standard Deviation

Posted by Amr Awadallah <aa...@cloudera.com>.
I agree that a builtin for std dev is a good idea.

that said, you can achieve this easy in one pass, just use:

select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, 
pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev
from ....

Matt Pestritto wrote:
> Hi.
>
> Are there plans to write a standard deviation aggregate function ?  I 
> had to build my own which translated into multiple hive queries.  
> While it works, a build-in function would have been much easier.
>
> Thanks
> -Matt