You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "mengxianwen (Jira)" <ji...@apache.org> on 2023/01/12 11:02:00 UTC

[jira] [Reopened] (HIVE-11576) Data loss in MapJoin

     [ https://issues.apache.org/jira/browse/HIVE-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mengxianwen reopened HIVE-11576:
--------------------------------

> Data loss in MapJoin
> --------------------
>
>                 Key: HIVE-11576
>                 URL: https://issues.apache.org/jira/browse/HIVE-11576
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Ted Xu
>            Assignee: Matt McCline
>            Priority: Major
>
> In query (TPC-H query4)
> {code:title=query4.sql|borderStyle=solid}
> create table q4_result as 
> select 
> o_orderpriority, 
> count(*) as order_count 
> from 
> orders o 
> join 
> ( 
> select 
> distinct l_orderkey 
> from 
> ( 
> select 
> * 
> from 
> lineitem 
> where 
> l_commitdate < l_receiptdate 
> ) tab1 
> ) tab2 
> on tab2.l_orderkey = o.o_orderkey 
> where 
> o.o_orderdate >= '1993-07-01' and o.o_orderdate < '1993-10-01' 
> group by 
> o_orderpriority 
> order by 
> o_orderpriority;
> {code}
> The query will cause data-loss if MapJoin is enabled. Both side of join have expected output but some data can't be joined together here. After disabling auto convert join, the problem is gone.
> Context:
> l_orderkey & o_orderkey are bigint.
> vectorized execution enabled.
> execution engine is tez.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)