You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/10/12 22:43:00 UTC

[jira] [Updated] (HIVE-20739) CBO: Rewrite INTERSECTS queries as EXISTS semi-joins

     [ https://issues.apache.org/jira/browse/HIVE-20739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gopal V updated HIVE-20739:
---------------------------
    Component/s: CBO

> CBO: Rewrite INTERSECTS queries as EXISTS semi-joins
> ----------------------------------------------------
>
>                 Key: HIVE-20739
>                 URL: https://issues.apache.org/jira/browse/HIVE-20739
>             Project: Hive
>          Issue Type: Improvement
>          Components: CBO
>            Reporter: Gopal V
>            Priority: Major
>
> INTERSECTS clause currently materializes both sides before removing entries from each other.
> This means that if the intersects has filters on one side which apply to the other side, it is not transitively pushed to the other side.
> Here's the snippet from the TPC-DS Query8 where that is visible.
> {code}
>     SELECT substr(ca_zip,1,5) ca_zip
>       FROM customer_address
>       WHERE substr(ca_zip,1,5) IN (
>                           '89436', /* ... */ ','15492'))
>      intersect
>      (select ca_zip
>       from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt
>             FROM customer_address, customer
>             WHERE ca_address_sk = c_current_addr_sk and
>                   c_preferred_cust_flag='Y'
>             group by ca_zip
>             having count(*) > 10)A1))A2
> {code}
> https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92
> where the entire join output of (customer_address x customer) is produced where the transitive filter inference could be applied to push the filter to the customer_address in the 2nd set in intersection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)