You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2020/09/24 15:04:00 UTC
[jira] [Created] (HIVE-24199) Incorrect result when subquey in
exists contains limit
Krisztian Kasa created HIVE-24199:
-------------------------------------
Summary: Incorrect result when subquey in exists contains limit
Key: HIVE-24199
URL: https://issues.apache.org/jira/browse/HIVE-24199
Project: Hive
Issue Type: Bug
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa
{code:java}
create table web_sales (ws_order_number int, ws_warehouse_sk int) stored as orc;
insert into web_sales values
(1, 1),
(1, 2),
(2, 1),
(2, 2);
select * from web_sales ws1
where exists (select 1 from web_sales ws2 where ws1.ws_order_number = ws2.ws_order_number limit 1);
1 1
1 2
{code}
{code:java}
CBO PLAN:
HiveSemiJoin(condition=[=($0, $2)], joinType=[semi])
HiveProject(ws_order_number=[$0], ws_warehouse_sk=[$1])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, web_sales]], table:alias=[ws1])
HiveProject(ws_order_number=[$0])
HiveSortLimit(fetch=[1]) <-- This shouldn't be added
HiveProject(ws_order_number=[$0])
HiveFilter(condition=[IS NOT NULL($0)])
HiveTableScan(table=[[default, web_sales]], table:alias=[ws2])
{code}
Limit n on the right side of the join reduces the result set coming from the right to only n record hence not all the ws_order_number values are included which leads to correctness issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)