You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (Jira)" <ji...@apache.org> on 2019/11/05 03:50:00 UTC

[jira] [Commented] (DRILL-4667) Improve memory footprint of broadcast joins

    [ https://issues.apache.org/jira/browse/DRILL-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967215#comment-16967215 ] 

Boaz Ben-Zvi commented on DRILL-4667:
-------------------------------------

Need to implement some kind of a *shared memory* for multiple instances of the operator (in multiple minor fragments) to use, as well as coordinate (which one builds it, when it is ready, when no longer needed, who can deallocate it).

> Improve memory footprint of broadcast joins
> -------------------------------------------
>
>                 Key: DRILL-4667
>                 URL: https://issues.apache.org/jira/browse/DRILL-4667
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.6.0
>            Reporter: Aman Sinha
>            Assignee: Boaz Ben-Zvi
>            Priority: Major
>             Fix For: 1.18.0
>
>
> For broadcast joins, currently Drill optimizes the data transfer across the network for broadcast table by sending a single copy to the receiving node which then distributes it to all minor fragments running on that particular node.  However, each minor fragment builds its own hash table (for a hash join) using this broadcast table.  We can substantially improve the memory footprint by having a shared copy of the hash table among multiple minor fragments on a node.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)