You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2018/10/12 22:43:00 UTC

[jira] [Created] (HIVE-20739) CBO: Rewrite INTERSECTS queries as EXISTS semi-joins

Gopal V created HIVE-20739:
------------------------------

             Summary: CBO: Rewrite INTERSECTS queries as EXISTS semi-joins
                 Key: HIVE-20739
                 URL: https://issues.apache.org/jira/browse/HIVE-20739
             Project: Hive
          Issue Type: Improvement
            Reporter: Gopal V


INTERSECTS clause currently materializes both sides before removing entries from each other.

This means that if the intersects has filters on one side which apply to the other side, it is not transitively pushed to the other side.

Here's the snippet from the TPC-DS Query8 where that is visible.

{code}
    SELECT substr(ca_zip,1,5) ca_zip
      FROM customer_address
      WHERE substr(ca_zip,1,5) IN (
                          '89436', /* ... */ ','15492'))
     intersect
     (select ca_zip
      from (SELECT substr(ca_zip,1,5) ca_zip,count(*) cnt
            FROM customer_address, customer
            WHERE ca_address_sk = c_current_addr_sk and
                  c_preferred_cust_flag='Y'
            group by ca_zip
            having count(*) > 10)A1))A2
{code}

https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpcds/query8.sql#L92

where the entire join output of (customer_address x customer) is produced where the transitive filter inference could be applied to push the filter to the customer_address in the 2nd set in intersection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)