You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2018/05/24 17:05:00 UTC

[jira] [Created] (DRILL-6444) Hash Join: Avoid partitioning when memory is sufficient

Boaz Ben-Zvi created DRILL-6444:
-----------------------------------

             Summary: Hash Join: Avoid partitioning when memory is sufficient 
                 Key: DRILL-6444
                 URL: https://issues.apache.org/jira/browse/DRILL-6444
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
            Reporter: Boaz Ben-Zvi
            Assignee: Boaz Ben-Zvi


The Hash Join Spilling feature introduced partitioning (of the incoming build side) which adds some overhead (copying the incoming data, row by row). That happens even when no spilling is needed.

Suggested optimization: Try reading the incoming build data without partitioning, while checking that enough memory is available. In case the whole build side (plus hash table) fits in memory - then continue like a "single partition". In case not, then need to partition the data read so far and continue as usual (with partitions).

(See optimization 8.1 in the Hash Join Spill design document: [https://docs.google.com/document/d/1-c_oGQY4E5d58qJYv_zc7ka834hSaB3wDQwqKcMoSAI/edit] )

This is currently implemented only for the case of num_partitions = 1 (i.e, no spilling, and no memory checking).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)