You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Timothy Farkas (JIRA)" <ji...@apache.org> on 2018/05/24 18:03:00 UTC

[jira] [Commented] (DRILL-6444) Hash Join: Avoid partitioning when memory is sufficient

    [ https://issues.apache.org/jira/browse/DRILL-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489495#comment-16489495 ] 

Timothy Farkas commented on DRILL-6444:
---------------------------------------

This would be a good feature. It would be good to see a design doc of how you plan on introducing this optimization and testing it. Specifically I'm concerned that we'll follow the convention of adding more and more code to the same file, and not adding proper abstractions and unit testing. Could you post a google doc with some of your thoughts on how you plan to implement this?

> Hash Join: Avoid partitioning when memory is sufficient 
> --------------------------------------------------------
>
>                 Key: DRILL-6444
>                 URL: https://issues.apache.org/jira/browse/DRILL-6444
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>            Priority: Minor
>
> The Hash Join Spilling feature introduced partitioning (of the incoming build side) which adds some overhead (copying the incoming data, row by row). That happens even when no spilling is needed.
> Suggested optimization: Try reading the incoming build data without partitioning, while checking that enough memory is available. In case the whole build side (plus hash table) fits in memory - then continue like a "single partition". In case not, then need to partition the data read so far and continue as usual (with partitions).
> (See optimization 8.1 in the Hash Join Spill design document: [https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit] )
> This is currently implemented only for the case of num_partitions = 1 (i.e, no spilling, and no memory checking).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)