You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2017/12/01 21:36:00 UTC
[jira] [Created] (HIVE-18200) Bucket Map Join : Use correct
algorithm to pick the big table
Deepak Jaiswal created HIVE-18200:
-------------------------------------
Summary: Bucket Map Join : Use correct algorithm to pick the big table
Key: HIVE-18200
URL: https://issues.apache.org/jira/browse/HIVE-18200
Project: Hive
Issue Type: Bug
Reporter: Deepak Jaiswal
Assignee: Deepak Jaiswal
Currently the algorithm to pick the big table is flawed due to complexity associated with n-way joins.
It could result in OOM, consider the following scenario,
CREATE TABLE tab_part (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
CREATE TABLE tab(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
Lets say tab has size of 2GB and tab_part has size of 500MB and noconditionaltasksize is 200MB, then bucket map join should not happen as atleast one hash table will be more than 250 MB, which may cause OOM.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)