You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/03/15 09:57:00 UTC

[jira] [Work logged] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

     [ https://issues.apache.org/jira/browse/HIVE-27142?focusedWorklogId=851100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-851100 ]

ASF GitHub Bot logged work on HIVE-27142:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Mar/23 09:56
            Start Date: 15/Mar/23 09:56
    Worklog Time Spent: 10m 
      Work Description: shameersss1 opened a new pull request, #4120:
URL: https://github.com/apache/hive/pull/4120

   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   When hive.auto.convert.join=true and if the underlying query is trying to join a large non-native hive table with a small native hive table, The map join is happening in the wrong side i.e on the map task which process the small native hive table and it can lead to OOM when the non-native table is really large and only few map tasks are spawned to scan the small native hive tables.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Added Qtest to validate the test scenario and the fix works for the same
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 851100)
    Remaining Estimate: 0h
            Time Spent: 10m

>  Map Join not working as expected when joining non-native tables with native tables
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-27142
>                 URL: https://issues.apache.org/jira/browse/HIVE-27142
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: All Versions
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>             Fix For: 4.0.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *1. Issue :*
> When *_hive.auto.convert.join=true_* and if the underlying query is trying to join a large non-native hive table with a small native hive table, The map join is happening in the wrong side i.e on the map task which process the small native hive table and it can lead to OOM when the non-native table is really large and only few map tasks are spawned to scan the small native hive tables.
>  
> *2. Why is this happening ?*
> This happens due to improper stats collection/computation of non native hive tables. Since the non-native hive tables are actually stored in a different location which Hive does not know of and only a temporary path which is visible to Hive while creating a non native table does not store the actual data, The stats collection logic tend to under estimate the data/rows and hence causes the map join to happen in the wrong side.
>  
> *3. Potential Solutions*
>  3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative impact of the query    if  the same query is trying to do multiple joins i.e one join with non-native tables and other join where both the tables are native.
>  3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command before joining native and non-native commands. The user may or may not choose to do it.
>  3.3 Don't collect/estimate stats for non-native hive tables by default (Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)