You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/05/12 03:11:16 UTC

[jira] [Created] (DRILL-690) Create 2-Phase aggregate plans for SUM, MIN, MAX

Aman Sinha created DRILL-690:
--------------------------------

             Summary: Create 2-Phase aggregate plans for SUM, MIN, MAX
                 Key: DRILL-690
                 URL: https://issues.apache.org/jira/browse/DRILL-690
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Aman Sinha


Currently, Drill generates 1-phase plans for aggregations with group-by where we do an initial distribution (if necessary) followed by either a sort + streaming aggregate or a hash aggregate.  In many cases, we should be able to do a 2-phase aggregation: 
Phase 1: local grouped-aggregation first and collapse potentially to 
               a small number of groups, 
Intermediate step:  hash-distribution (on grouping keys) 
Phase 2: final aggregation.  

The amount of data transferred over the network can be potentially much smaller compared to the 1-phase approach.  

For aggregates such as SUM, MIN and MAX, both phase 1 and 2 do exactly the same aggregate function; however for other aggregate functions such as COUNT, the first phase has to do a count and second phase must SUM the counts.  In this particular enhancement, we will only address the functions SUM, MIN, MAX. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)