You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arunkumar Pillai <ar...@gmail.com> on 2016/02/23 18:13:57 UTC

Calculation of histogram bins and frequency in Apache spark 1.6

Hi
Is there any predefined method to calculate histogram bins and frequency in
spark. Currently I take range and find bins then count frequency using SQL
query.

Is there any better way

Re: Calculation of histogram bins and frequency in Apache spark 1.6

Posted by Yanbo Liang <yb...@gmail.com>.
Actually Spark SQL `groupBy` with `count` can get frequency in each bin.
You can also try with DataFrameStatFunctions.freqItems() to get the
frequent items for columns.

Thanks
Yanbo

2016-02-24 1:21 GMT+08:00 Burak Yavuz <br...@gmail.com>:

> You could use the Bucketizer transformer in Spark ML.
>
> Best,
> Burak
>
> On Tue, Feb 23, 2016 at 9:13 AM, Arunkumar Pillai <arunkumar1111@gmail.com
> > wrote:
>
>> Hi
>> Is there any predefined method to calculate histogram bins and frequency
>> in spark. Currently I take range and find bins then count frequency using
>> SQL query.
>>
>> Is there any better way
>>
>
>

Re: Calculation of histogram bins and frequency in Apache spark 1.6

Posted by Burak Yavuz <br...@gmail.com>.
You could use the Bucketizer transformer in Spark ML.

Best,
Burak

On Tue, Feb 23, 2016 at 9:13 AM, Arunkumar Pillai <ar...@gmail.com>
wrote:

> Hi
> Is there any predefined method to calculate histogram bins and frequency
> in spark. Currently I take range and find bins then count frequency using
> SQL query.
>
> Is there any better way
>