You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2023/03/20 10:38:00 UTC

[jira] [Commented] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

    [ https://issues.apache.org/jira/browse/HIVE-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702604#comment-17702604 ] 

Krisztian Kasa commented on HIVE-27142:
---------------------------------------

[~srahman]
> The map join is happening in the wrong side i.e on the map task which process the small native hive table and it can lead to OOM 
It seems that this was one of the reason why runtime statistics feature was implemented. HIVE-17626.
Please see the config settings:
https://github.com/apache/hive/blob/7d69a8ce8cebf9a6d255d5aa998584e4e183085c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L5552-L5577


>  Map Join not working as expected when joining non-native tables with native tables
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-27142
>                 URL: https://issues.apache.org/jira/browse/HIVE-27142
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: All Versions
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to join a large non-native hive table with a small native hive table, The map join is happening in the wrong side i.e on the map task which process the small native hive table and it can lead to OOM when the non-native table is really large and only few map tasks are spawned to scan the small native hive tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive tables. Since the non-native hive tables are actually stored in a different location which Hive does not know of and only a temporary path which is visible to Hive while creating a non native table does not store the actual data, The stats collection logic tend to under estimate the data/rows and hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative impact of the query    if  the same query is trying to do multiple joins i.e one join with non-native tables and other join where both the tables are native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command before joining native and non-native commands. The user may or may not choose to do it.
>  3.3 Do not collect/estimate stats for non-native hive tables by default (Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)