You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2017/12/01 21:36:00 UTC

[jira] [Assigned] (HIVE-18200) Bucket Map Join : Use correct algorithm to pick the big table

     [ https://issues.apache.org/jira/browse/HIVE-18200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deepak Jaiswal reassigned HIVE-18200:
-------------------------------------


> Bucket Map Join : Use correct algorithm to pick the big table
> -------------------------------------------------------------
>
>                 Key: HIVE-18200
>                 URL: https://issues.apache.org/jira/browse/HIVE-18200
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Deepak Jaiswal
>
> Currently the algorithm to pick the big table is flawed due to complexity associated with n-way joins.
> It could result in OOM, consider the following scenario,
> CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
> CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
> Lets say tab has size of 2GB and tab_part has size of 500MB and noconditionaltasksize is 200MB, then bucket map join should not happen as atleast one hash table will be more than 250 MB, which may cause OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)