You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by 陶 加涛 <ta...@gmail.com> on 2018/10/23 02:39:33 UTC

About introduce function sum0 to Spark

Hi, in calcite, has the concept of sum0, here I quote the definition of sum0:

Sum0 is an aggregator which returns the sum of the values which
go into it like Sum. It differs in that when no non null values
are applied zero is returned instead of null..

One scenario is that we can use sum0 to implement pre-calculation count(pre-calculation system like Apache Kylin).

It is very easy in Spark to implement sum0, if community consider this is necessary, I would like to open a JIRA and implement this.

---
Regards!
Aron Tao


Re: About introduce function sum0 to Spark

Posted by Mark Hamstra <ma...@clearstorydata.com>.
Yes, as long as you are only talking about summing numeric values. Part of
my point, though, is that this is just a special case of folding or
aggregating with an initial or 'zero' value. It doesn't need to be limited
to just numeric sums with zero = 0.

On Tue, Oct 23, 2018 at 12:23 AM Wenchen Fan <cl...@gmail.com> wrote:

> This is logically `sum( if(isnull(col), 0, col) )` right?
>
> On Tue, Oct 23, 2018 at 2:58 PM 陶 加涛 <ta...@gmail.com> wrote:
>
>> The name is from Apache Calcite, And it doesn’t matter, we can introduce
>> our own.
>>
>>
>>
>>
>>
>> ---
>>
>> Regards!
>>
>> Aron Tao
>>
>>
>>
>> *发件人**: *Mark Hamstra <ma...@clearstorydata.com>
>> *日期**: *2018年10月23日 星期二 12:28
>> *收件人**: *"taojiatao@gmail.com" <ta...@gmail.com>
>> *抄送**: *dev <de...@spark.apache.org>
>> *主题**: *Re: About introduce function sum0 to Spark
>>
>>
>>
>> That's a horrible name. This is just a fold.
>>
>>
>>
>> On Mon, Oct 22, 2018 at 7:39 PM 陶 加涛 <ta...@gmail.com> wrote:
>>
>> Hi, in calcite, has the concept of sum0, here I quote the definition of
>> sum0:
>>
>>
>>
>> Sum0 is an aggregator which returns the sum of the values which
>>
>> go into it like Sum. It differs in that when no non null values
>>
>> are applied zero is returned instead of null..
>>
>>
>>
>> One scenario is that we can use sum0 to implement pre-calculation
>> count(pre-calculation system like Apache Kylin).
>>
>>
>>
>> It is very easy in Spark to implement sum0, if community consider this is
>> necessary, I would like to open a JIRA and implement this.
>>
>>
>>
>> ---
>>
>> Regards!
>>
>> Aron Tao
>>
>>
>>
>>

Re: About introduce function sum0 to Spark

Posted by Wenchen Fan <cl...@gmail.com>.
This is logically `sum( if(isnull(col), 0, col) )` right?

On Tue, Oct 23, 2018 at 2:58 PM 陶 加涛 <ta...@gmail.com> wrote:

> The name is from Apache Calcite, And it doesn’t matter, we can introduce
> our own.
>
>
>
>
>
> ---
>
> Regards!
>
> Aron Tao
>
>
>
> *发件人**: *Mark Hamstra <ma...@clearstorydata.com>
> *日期**: *2018年10月23日 星期二 12:28
> *收件人**: *"taojiatao@gmail.com" <ta...@gmail.com>
> *抄送**: *dev <de...@spark.apache.org>
> *主题**: *Re: About introduce function sum0 to Spark
>
>
>
> That's a horrible name. This is just a fold.
>
>
>
> On Mon, Oct 22, 2018 at 7:39 PM 陶 加涛 <ta...@gmail.com> wrote:
>
> Hi, in calcite, has the concept of sum0, here I quote the definition of
> sum0:
>
>
>
> Sum0 is an aggregator which returns the sum of the values which
>
> go into it like Sum. It differs in that when no non null values
>
> are applied zero is returned instead of null..
>
>
>
> One scenario is that we can use sum0 to implement pre-calculation
> count(pre-calculation system like Apache Kylin).
>
>
>
> It is very easy in Spark to implement sum0, if community consider this is
> necessary, I would like to open a JIRA and implement this.
>
>
>
> ---
>
> Regards!
>
> Aron Tao
>
>
>
>

Re: About introduce function sum0 to Spark

Posted by 陶 加涛 <ta...@gmail.com>.
The name is from Apache Calcite, And it doesn’t matter, we can introduce our own.


---
Regards!
Aron Tao

发件人: Mark Hamstra <ma...@clearstorydata.com>
日期: 2018年10月23日 星期二 12:28
收件人: "taojiatao@gmail.com" <ta...@gmail.com>
抄送: dev <de...@spark.apache.org>
主题: Re: About introduce function sum0 to Spark

That's a horrible name. This is just a fold.

On Mon, Oct 22, 2018 at 7:39 PM 陶 加涛 <ta...@gmail.com>> wrote:
Hi, in calcite, has the concept of sum0, here I quote the definition of sum0:

Sum0 is an aggregator which returns the sum of the values which
go into it like Sum. It differs in that when no non null values
are applied zero is returned instead of null..

One scenario is that we can use sum0 to implement pre-calculation count(pre-calculation system like Apache Kylin).

It is very easy in Spark to implement sum0, if community consider this is necessary, I would like to open a JIRA and implement this.

---
Regards!
Aron Tao


Re: About introduce function sum0 to Spark

Posted by Mark Hamstra <ma...@clearstorydata.com>.
That's a horrible name. This is just a fold.

On Mon, Oct 22, 2018 at 7:39 PM 陶 加涛 <ta...@gmail.com> wrote:

> Hi, in calcite, has the concept of sum0, here I quote the definition of
> sum0:
>
>
>
> Sum0 is an aggregator which returns the sum of the values which
>
> go into it like Sum. It differs in that when no non null values
>
> are applied zero is returned instead of null..
>
>
>
> One scenario is that we can use sum0 to implement pre-calculation
> count(pre-calculation system like Apache Kylin).
>
>
>
> It is very easy in Spark to implement sum0, if community consider this is
> necessary, I would like to open a JIRA and implement this.
>
>
>
> ---
>
> Regards!
>
> Aron Tao
>
>
>