You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Li Yang <li...@apache.org> on 2015/08/20 11:35:19 UTC

Enumerable groupBy() take advantage of input collation?

I encountered Out Of Mem exception when a huge result set is passed into
EnumerableAggregate and get aggregated in memory. I'm thinking if the input
is sorted by the group-by key, then the groupBy() don't have to hold all
data in memory any more.

So does the Enumerable groupBy() take advantage of input collation
currently?  Should I open a JIRA for it?


Cheers
Yang

Re: Enumerable groupBy() take advantage of input collation?

Posted by Julian Hyde <jh...@apache.org>.

Thanks!

> On Aug 21, 2015, at 4:01 PM, Li Yang <li...@apache.org> wrote:
> 
> https://issues.apache.org/jira/browse/CALCITE-853
> 
> On Fri, Aug 21, 2015 at 2:20 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> Yes, that would be useful. Please log a jira.
>> 
>> Enumerable.groupBy doesn't know its input's collation so can't make that
>> decision, but EnumerableAggregate does. I think that EnumerableAggregate
>> should have a "trigger key", a subset of its group key, and if the trigger
>> key changes it will emit and flush its hash table.
>> 
>> As well as for your use case, it will be useful for streaming queries.
>> 
>> Julian
>> 
>>> On Aug 20, 2015, at 2:35 AM, Li Yang <li...@apache.org> wrote:
>>> 
>>> I encountered Out Of Mem exception when a huge result set is passed into
>>> EnumerableAggregate and get aggregated in memory. I'm thinking if the
>> input
>>> is sorted by the group-by key, then the groupBy() don't have to hold all
>>> data in memory any more.
>>> 
>>> So does the Enumerable groupBy() take advantage of input collation
>>> currently?  Should I open a JIRA for it?
>>> 
>>> 
>>> Cheers
>>> Yang
>> 
>>

Re: Enumerable groupBy() take advantage of input collation?

Posted by Li Yang <li...@apache.org>.

https://issues.apache.org/jira/browse/CALCITE-853

On Fri, Aug 21, 2015 at 2:20 PM, Julian Hyde <jh...@apache.org> wrote:

> Yes, that would be useful. Please log a jira.
>
> Enumerable.groupBy doesn't know its input's collation so can't make that
> decision, but EnumerableAggregate does. I think that EnumerableAggregate
> should have a "trigger key", a subset of its group key, and if the trigger
> key changes it will emit and flush its hash table.
>
> As well as for your use case, it will be useful for streaming queries.
>
> Julian
>
> > On Aug 20, 2015, at 2:35 AM, Li Yang <li...@apache.org> wrote:
> >
> > I encountered Out Of Mem exception when a huge result set is passed into
> > EnumerableAggregate and get aggregated in memory. I'm thinking if the
> input
> > is sorted by the group-by key, then the groupBy() don't have to hold all
> > data in memory any more.
> >
> > So does the Enumerable groupBy() take advantage of input collation
> > currently?  Should I open a JIRA for it?
> >
> >
> > Cheers
> > Yang
>
>

Re: Enumerable groupBy() take advantage of input collation?

Posted by Julian Hyde <jh...@apache.org>.

Yes, that would be useful. Please log a jira. 

Enumerable.groupBy doesn't know its input's collation so can't make that decision, but EnumerableAggregate does. I think that EnumerableAggregate should have a "trigger key", a subset of its group key, and if the trigger key changes it will emit and flush its hash table. 

As well as for your use case, it will be useful for streaming queries. 

Julian

> On Aug 20, 2015, at 2:35 AM, Li Yang <li...@apache.org> wrote:
> 
> I encountered Out Of Mem exception when a huge result set is passed into
> EnumerableAggregate and get aggregated in memory. I'm thinking if the input
> is sorted by the group-by key, then the groupBy() don't have to hold all
> data in memory any more.
> 
> So does the Enumerable groupBy() take advantage of input collation
> currently?  Should I open a JIRA for it?
> 
> 
> Cheers
> Yang