You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Deepak Jaiswal (JIRA)" <ji...@apache.org> on 2018/08/15 07:01:00 UTC

[jira] [Work started] (HIVE-16976) DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN

     [ https://issues.apache.org/jira/browse/HIVE-16976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-16976 started by Deepak Jaiswal.
---------------------------------------------
> DPP: SyntheticJoinPredicate transitivity for < > and BETWEEN
> ------------------------------------------------------------
>
>                 Key: HIVE-16976
>                 URL: https://issues.apache.org/jira/browse/HIVE-16976
>             Project: Hive
>          Issue Type: Improvement
>          Components: Tez
>    Affects Versions: 2.1.1, 3.0.0
>            Reporter: Gopal V
>            Assignee: Deepak Jaiswal
>            Priority: Major
>
> Tez DPP does not kick in for scenarios where a user wants to run a comparison clause instead of a JOIN/IN clause.
> {code}
> explain select count(1) from store_sales where ss_sold_date_sk > (select max(d_Date_sk) from date_dim where d_year = 2017);
> Warning: Map Join MAPJOIN[21][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Reducer 2 vectorized, llap
>       File Output Operator [FS_36]
>         Group By Operator [GBY_35] (rows=1 width=8)
>           Output:["_col0"],aggregations:["count(VALUE._col0)"]
>         <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>           PARTITION_ONLY_SHUFFLE [RS_34]
>             Group By Operator [GBY_33] (rows=1 width=8)
>               Output:["_col0"],aggregations:["count(1)"]
>               Select Operator [SEL_32] (rows=9600142089 width=16)
>                 Filter Operator [FIL_31] (rows=9600142089 width=16)
>                   predicate:(_col0 > _col1)
>                   Map Join Operator [MAPJOIN_30] (rows=28800426268 width=16)
>                     Conds:(Inner),Output:["_col0","_col1"]
>                   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
>                     BROADCAST [RS_28]
>                       Group By Operator [GBY_27] (rows=1 width=8)
>                         Output:["_col0"],aggregations:["max(VALUE._col0)"]
>                       <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>                         PARTITION_ONLY_SHUFFLE [RS_26]
>                           Group By Operator [GBY_25] (rows=1 width=8)
>                             Output:["_col0"],aggregations:["max(d_date_sk)"]
>                             Select Operator [SEL_24] (rows=652 width=12)
>                               Output:["d_date_sk"]
>                               Filter Operator [FIL_23] (rows=652 width=12)
>                                 predicate:(d_year = 2017)
>                                 TableScan [TS_2] (rows=73049 width=12)
>                                   tpcds_bin_partitioned_newschema_orc_10000@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year"]
>                   <-Select Operator [SEL_29] (rows=28800426268 width=8)
>                       Output:["_col0"]
>                       TableScan [TS_0] (rows=28800426268 width=172)
>                         tpcds_bin_partitioned_newschema_orc_10000@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE
> {code}
> The SyntheticJoinPredicate is only injected for equi joins, not for < or > scalar subqueries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)