You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@quickstep.apache.org by "Harshad Deshmukh (JIRA)" <ji...@apache.org> on 2016/09/21 19:21:21 UTC

[jira] [Created] (QUICKSTEP-57) FinalizeAggregation Performance Improvement

Harshad Deshmukh created QUICKSTEP-57:
-----------------------------------------

             Summary: FinalizeAggregation Performance Improvement
                 Key: QUICKSTEP-57
                 URL: https://issues.apache.org/jira/browse/QUICKSTEP-57
             Project: Apache Quickstep
          Issue Type: Improvement
          Components: Relational Operators, Storage
            Reporter: Harshad Deshmukh
            Assignee: Harshad Deshmukh


The two step GROUP BY aggregation involves two steps:
1. Aggregation from StorageBlocks in different hash tables. (Performed through Aggregation operator). The number of hash tables are same as number of worker threads. Each thread uses only one hash table at a time. 
2. Merging the various aggregation hash tables in one (Performed through Finalize Aggregation operator)

The step 2 is needed because the same GROUP BY key could be present in multiple hash tables and we need to merge the payloads for the key. 

We can avoid the step 2 if the different hash tables mentioned in step 1 have no overlap in terms of their GROUP BY keys. One way to achieve this is by partitioning the aggregated tuples based on their GROUP BY keys. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)