You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2017/10/19 18:05:00 UTC

[jira] [Created] (HIVE-17848) Bucket Map Join : Implement an efficient way to minimize loading hash table

Deepak Jaiswal created HIVE-17848:
-------------------------------------

             Summary: Bucket Map Join : Implement an efficient way to minimize loading hash table
                 Key: HIVE-17848
                 URL: https://issues.apache.org/jira/browse/HIVE-17848
             Project: Hive
          Issue Type: Bug
            Reporter: Deepak Jaiswal
            Assignee: Deepak Jaiswal


In bucket mapjoin, each task loads its own copy of hash table which is inefficient as load is IO heavy and due to multiple copies of same hash table, the tables may get GCed on a busy system.
Implement a subcache with softreference to each hash table corresponding to its bucketID such that it can be reused by a task.

This needs changes from Tez side to push bucket id to TezProcessor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)