You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Liyin Tang (JIRA)" <ji...@apache.org> on 2010/11/02 03:29:26 UTC

[jira] Updated: (HIVE-1754) Remove JDBM component from Map Join

     [ https://issues.apache.org/jira/browse/HIVE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HIVE-1754:
-----------------------------

    Status: Patch Available  (was: Open)

This patch modifies the following things
1) Remove the JDBM from Hive
2) All the data in the small table will be stored in in-memory hashtable.
3) Create a light-weight RowContainer: MapJoinRowContainer.
4) Optimize MapJoinObjectKey. If there are only one join key or two join keys, it will use MapJoinSingleKey or MapJoinDoulbeKeys instead of MapJoinObjectKey.

> Remove JDBM component from Map Join
> -----------------------------------
>
>                 Key: HIVE-1754
>                 URL: https://issues.apache.org/jira/browse/HIVE-1754
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.7.0
>
>         Attachments: Hive-1754.patch
>
>
> Right now, JDBM is the major performance bottleneck of performance.
> With the growth of the small table, the PUT and GET operation will take most of execution time.
> Map Join is designed to load the data of small table into memory. 
> If the data is too large to hold in memory, then there is no need to use the map join strategy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.