You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Harish Butani (JIRA)" <ji...@apache.org> on 2013/11/22 21:55:42 UTC
[jira] [Created] (HIVE-5873) SubQuery: In subquery Count Bug
Harish Butani created HIVE-5873:
-----------------------------------
Summary: SubQuery: In subquery Count Bug
Key: HIVE-5873
URL: https://issues.apache.org/jira/browse/HIVE-5873
Project: Hive
Issue Type: Bug
Components: Query Processor
Reporter: Harish Butani
This is from the Optimization of Nested SQl Queries Revisited paper: http://dl.acm.org/citation.cfm?id=38723
Consider Part table having:
{noformat}
PNum OrderOnHand
-------- ------------------
3 6
10 1
8 0
{noformat}
Supply table having:
{noformat}
PNum Qty
3 4
3 2
10 1
{noformat}
The query:
{noformat}
select pnum
from parts p
where orderOnHand
in (select count(*) from supply s
where s.pnum = p.pnum
)
{noformat}
should return the row with PNum=8.
But a transformation to a semi-join would eliminate this row, as there are no rows in supply table with PNum=8.
AS shown in the paper the soln is to transform to:
{noformat}
select pnum
from parts p semijoin
(select p1.pnum, count(*) as c
from (select distinct pnum from parts) p1 join supply s
where s.pnum = p1.pnum
) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c
{noformat}
The additional distinct query within the SubQuery is to handle duplicates in the outer query on the joining columns.
--
This message was sent by Atlassian JIRA
(v6.1#6144)