You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2015/10/22 19:34:27 UTC

[jira] [Created] (PHOENIX-2344) Implement partial stream aggregate

Maryann Xue created PHOENIX-2344:
------------------------------------

             Summary: Implement partial stream aggregate
                 Key: PHOENIX-2344
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2344
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Maryann Xue
            Assignee: Maryann Xue


We now have ordered group-by (stream aggregate) and unordered group-by (hash aggregate) in Phoenix. Stream aggregate is usually much more beneficial than hash aggregate in terms of memory usage and pipelining, but it requires that the aggregate's input is ordered on group-by expressions, i.e. the group-by expressions is the beginning part of the input's collation (ordering).
However, we could have something in the middle, a stream/hash hybrid aggregate when the group-by expressions and the input collation share some common part. For example, we group table T1 by column A, B and T1 is sorted on column A, C, we'll have the ordered part as A, and the hash part as B. Thus within the range of a same A, a hash table is used for collecting all different Bs; while at the changing point of A, we can purge the intermediate hash table and feed the result for the previous A to next operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)