You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Baike Xia (Jira)" <ji...@apache.org> on 2022/07/13 08:15:00 UTC

[jira] [Created] (IMPALA-11424) Support pushdown non-equi join predicate from OUTER/INNER JOIN to SCANNODE

Baike Xia created IMPALA-11424:
----------------------------------

             Summary: Support pushdown non-equi join predicate from OUTER/INNER JOIN  to SCANNODE
                 Key: IMPALA-11424
                 URL: https://issues.apache.org/jira/browse/IMPALA-11424
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Baike Xia
             Fix For: Impala 4.1.0


In order to reduce the amount of data read and transmitted, the non-equivalent condition of Join can be pushed to SCAN_NODE.
For example:
{code:java}
// code placeholder
select count(1) from ( select ss.ss_ticket_numberFROM store_sales  ss LEFT OUTER JOIN store_returns sr ON (sr.sr_item_sk = ss.ss_item_sk  AND sr.sr_ticket_number >= ss.ss_ticket_number) where ss.ss_sold_date_sk = 2450816) t where t.ss_ticket_number = 79577; {code}
Current plan:
{code:java}
// code placeholder
PLAN-ROOT SINK
|
07:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  row-size=8B cardinality=1
|
06:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE
|  output: count(*)
|  row-size=8B cardinality=1
|
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
|  hash predicates: sr.sr_item_sk = ss.ss_item_sk
|  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
|  runtime filters: RF000 <- ss.ss_item_sk
|  row-size=32B cardinality=16
|
|--05:EXCHANGE [HASH(ss.ss_item_sk)]
|  |
|  00:SCAN HDFS [tpcds_parquet.store_sales ss]
|     partition predicates: ss.ss_sold_date_sk = 2450816
|     partitions=1/1824 files=1 size=70.77KB
|     predicates: ss.ss_ticket_number = 79577
|     row-size=16B cardinality=1
|
04:EXCHANGE [HASH(sr.sr_item_sk)]
|
01:SCAN HDFS [tpcds_parquet.store_returns sr]
   partitions=1/1 files=1 size=15.42MB
   runtime filters: RF000 -> sr.sr_item_sk
   row-size=16B cardinality=287.51K{code}
After Pushdown:
{code:java}
// code placeholder
PLAN-ROOT SINK
|
07:AGGREGATE [FINALIZE]
|  output: count:merge(*)
|  row-size=8B cardinality=1
|
06:EXCHANGE [UNPARTITIONED]
|
03:AGGREGATE
|  output: count(*)
|  row-size=8B cardinality=1
|
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
|  hash predicates: sr.sr_item_sk = ss.ss_item_sk
|  other join predicates: sr.sr_ticket_number >= ss.ss_ticket_number
|  runtime filters: RF000 <- ss.ss_item_sk
|  row-size=32B cardinality=16
|
|--05:EXCHANGE [HASH(ss.ss_item_sk)]
|  |
|  00:SCAN HDFS [tpcds_parquet.store_sales ss]
|     partition predicates: ss.ss_sold_date_sk = 2450816
|     partitions=1/1824 files=1 size=70.77KB
|     predicates: ss.ss_ticket_number = 79577
|     row-size=16B cardinality=1
|
04:EXCHANGE [HASH(sr.sr_item_sk)]
|
01:SCAN HDFS [tpcds_parquet.store_returns sr]
   partitions=1/1 files=1 size=15.42MB
   predicates: sr.sr_ticket_number >= 79577
   runtime filters: RF000 -> sr.sr_item_sk
   row-size=16B cardinality=28.75K {code}
 
For pushdown of Join non-equi conjuncts, the current qualifications:
1. Only support LEFT_OUTER_JOIN, RIGHT_OUTER_JOIN, INNER_JOIN;
2. Only valid for non-equi predicates containing literalExpr,
for example: slot >= Literal, slot in Literal list;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)