You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2018/12/21 07:11:00 UTC
[jira] [Updated] (HIVE-16976) DPP: SyntheticJoinPredicate
transitivity for < > and BETWEEN
[ https://issues.apache.org/jira/browse/HIVE-16976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deepak Jaiswal updated HIVE-16976:
----------------------------------
Status: Patch Available (was: In Progress)
> DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN
> ------------------------------------------------------------
>
> Key: HIVE-16976
> URL: https://issues.apache.org/jira/browse/HIVE-16976
> Project: Hive
> Issue Type: Improvement
> Components: Tez
> Affects Versions: 3.0.0, 2.1.1
> Reporter: Gopal V
> Assignee: Deepak Jaiswal
> Priority: Major
> Attachments: HIVE-16976.1.patch
>
>
> Tez DPP does not kick in for scenarios where a user wants to run a comparison clause instead of a JOIN/IN clause.
> {code}
> explain select count(1) from store_sales where ss_sold_date_sk > (select max(d_Date_sk) from date_dim where d_year = 2017);
> Warning: Map Join MAPJOIN[21][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Reducer 2 vectorized, llap
> File Output Operator [FS_36]
> Group By Operator [GBY_35] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_34]
> Group By Operator [GBY_33] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(1)"]
> Select Operator [SEL_32] (rows=9600142089 width=16)
> Filter Operator [FIL_31] (rows=9600142089 width=16)
> predicate:(_col0 > _col1)
> Map Join Operator [MAPJOIN_30] (rows=28800426268 width=16)
> Conds:(Inner),Output:["_col0","_col1"]
> <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_28]
> Group By Operator [GBY_27] (rows=1 width=8)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
> <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_26]
> Group By Operator [GBY_25] (rows=1 width=8)
> Output:["_col0"],aggregations:["max(d_date_sk)"]
> Select Operator [SEL_24] (rows=652 width=12)
> Output:["d_date_sk"]
> Filter Operator [FIL_23] (rows=652 width=12)
> predicate:(d_year = 2017)
> TableScan [TS_2] (rows=73049 width=12)
> tpcds_bin_partitioned_newschema_orc_10000@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year"]
> <-Select Operator [SEL_29] (rows=28800426268 width=8)
> Output:["_col0"]
> TableScan [TS_0] (rows=28800426268 width=172)
> tpcds_bin_partitioned_newschema_orc_10000@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE
> {code}
> The SyntheticJoinPredicate is only injected for equi joins, not for < or > scalar subqueries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)