You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2017/12/01 21:36:00 UTC

[jira] [Created] (HIVE-18200) Bucket Map Join : Use correct algorithm to pick the big table

Deepak Jaiswal created HIVE-18200:
-------------------------------------

             Summary: Bucket Map Join : Use correct algorithm to pick the big table
                 Key: HIVE-18200
                 URL: https://issues.apache.org/jira/browse/HIVE-18200
             Project: Hive
          Issue Type: Bug
            Reporter: Deepak Jaiswal
            Assignee: Deepak Jaiswal


Currently the algorithm to pick the big table is flawed due to complexity associated with n-way joins.
It could result in OOM, consider the following scenario,

CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;

Lets say tab has size of 2GB and tab_part has size of 500MB and noconditionaltasksize is 200MB, then bucket map join should not happen as atleast one hash table will be more than 250 MB, which may cause OOM.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)