You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Pritesh Maker (JIRA)" <ji...@apache.org> on 2018/05/22 18:20:01 UTC

[jira] [Updated] (DRILL-6400) Hash-Aggr: Avoid recreating common Hash-Table setups for every partition

     [ https://issues.apache.org/jira/browse/DRILL-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pritesh Maker updated DRILL-6400:
---------------------------------
    Fix Version/s:     (was: 1.14.0)

> Hash-Aggr: Avoid recreating common Hash-Table setups for every partition
> ------------------------------------------------------------------------
>
>                 Key: DRILL-6400
>                 URL: https://issues.apache.org/jira/browse/DRILL-6400
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.13.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Minor
>
>  The current Hash-Aggr code (and soon the Hash-Join code) creates multiple partitions to hold the incoming data; each partition with its own HashTable. 
>      The current code invokes the HashTable method _createAndSetupHashTable()_ for *each* partition. But most of the setups done by this method are identical for all the partitions (e.g., code generation).  Calling this method has a performance cost (some local tests measured between 3 - 30 milliseconds, depends on the key columns).
>   Suggested performance improvement: Extract the common settings to be called *once*, and use the results later by all the partitions. When running with the default 32 partitions, this can have a measurable improvement (and if spilling, this method is used again....).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)