You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/10/12 22:43:00 UTC
[jira] [Created] (HIVE-20739) CBO: Rewrite INTERSECTS queries as
EXISTS semi-joins
Gopal V created HIVE-20739:
------------------------------
Summary: CBO: Rewrite INTERSECTS queries as EXISTS semi-joins
Key: HIVE-20739
URL: https://issues.apache.org/jira/browse/HIVE-20739
Project: Hive
Issue Type: Improvement
Reporter: Gopal V
INTERSECTS clause currently materializes both sides before removing entries from each other.
This means that if the intersects has filters on one side which apply to the other side, it is not transitively pushed to the other side.
Here's the snippet from the TPC-DS Query8 where that is visible.
{code}
SELECT substr(ca_zip,1,5) ca_zip
FROM customer_address
WHERE substr(ca_zip,1,5) IN (
'89436', /* ... */ ','15492'))
intersect
(select ca_zip
from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt
FROM customer_address, customer
WHERE ca_address_sk = c_current_addr_sk and
c_preferred_cust_flag='Y'
group by ca_zip
having count(*) > 10)A1))A2
{code}
https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92
where the entire join output of (customer_address x customer) is produced where the transitive filter inference could be applied to push the filter to the customer_address in the 2nd set in intersection.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)