You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2020/10/07 17:17:00 UTC
[jira] [Resolved] (HIVE-24199) Incorrect result when subquey in
exists contains limit
[ https://issues.apache.org/jira/browse/HIVE-24199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krisztian Kasa resolved HIVE-24199.
-----------------------------------
Resolution: Fixed
Pushed to master, thanks [~vgarg] for review.
> Incorrect result when subquey in exists contains limit
> ------------------------------------------------------
>
> Key: HIVE-24199
> URL: https://issues.apache.org/jira/browse/HIVE-24199
> Project: Hive
> Issue Type: Bug
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> {code:java}
> create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc;
> insert into web_sales values
> (1, 1),
> (1, 2),
> (2, 1),
> (2, 2);
> select * from web_sales ws1
> where exists (select 1 from web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number limit 1);
> 1 1
> 1 2
> {code}
> {code:java}
> CBO PLAN:
> HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
> HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
> HiveProject(ws_order_number=[$0])
> HiveSortLimit(fetch=[1]) <-- This shouldn't be added
> HiveProject(ws_order_number=[$0])
> HiveFilter(condition=[IS NOT NULL($0)])
> HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
> {code}
> Limit n on the right side of the join reduces the result set coming from the right to only n record hence not all the ws_order_number values are included which leads to correctness issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)