You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/10/12 22:43:00 UTC
[jira] [Updated] (HIVE-20739) CBO: Rewrite INTERSECTS queries as
EXISTS semi-joins
[ https://issues.apache.org/jira/browse/HIVE-20739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-20739:
---------------------------
Component/s: CBO
> CBO: Rewrite INTERSECTS queries as EXISTS semi-joins
> ----------------------------------------------------
>
> Key: HIVE-20739
> URL: https://issues.apache.org/jira/browse/HIVE-20739
> Project: Hive
> Issue Type: Improvement
> Components: CBO
> Reporter: Gopal V
> Priority: Major
>
> INTERSECTS clause currently materializes both sides before removing entries from each other.
> This means that if the intersects has filters on one side which apply to the other side, it is not transitively pushed to the other side.
> Here's the snippet from the TPC-DS Query8 where that is visible.
> {code}
> SELECT substr(ca_zip,1,5) ca_zip
> FROM customer_address
> WHERE substr(ca_zip,1,5) IN (
> '89436', /* ... */ ','15492'))
> intersect
> (select ca_zip
> from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt
> FROM customer_address, customer
> WHERE ca_address_sk = c_current_addr_sk and
> c_preferred_cust_flag='Y'
> group by ca_zip
> having count(*) > 10)A1))A2
> {code}
> https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92
> where the entire join output of (customer_address x customer) is produced where the transitive filter inference could be applied to push the filter to the customer_address in the 2nd set in intersection.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)