You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Pritesh Maker (JIRA)" <ji...@apache.org> on 2018/05/22 18:20:01 UTC
[jira] [Updated] (DRILL-6400) Hash-Aggr: Avoid recreating common
Hash-Table setups for every partition
[ https://issues.apache.org/jira/browse/DRILL-6400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pritesh Maker updated DRILL-6400:
---------------------------------
Fix Version/s: (was: 1.14.0)
> Hash-Aggr: Avoid recreating common Hash-Table setups for every partition
> ------------------------------------------------------------------------
>
> Key: DRILL-6400
> URL: https://issues.apache.org/jira/browse/DRILL-6400
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.13.0
> Reporter: Boaz Ben-Zvi
> Assignee: Boaz Ben-Zvi
> Priority: Minor
>
> The current Hash-Aggr code (and soon the Hash-Join code) creates multiple partitions to hold the incoming data; each partition with its own HashTable.
> The current code invokes the HashTable method _createAndSetupHashTable()_ for *each* partition. But most of the setups done by this method are identical for all the partitions (e.g., code generation). Calling this method has a performance cost (some local tests measured between 3 - 30 milliseconds, depends on the key columns).
> Suggested performance improvement: Extract the common settings to be called *once*, and use the results later by all the partitions. When running with the default 32 partitions, this can have a measurable improvement (and if spilling, this method is used again....).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)