You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2016/02/26 04:53:18 UTC

[jira] [Resolved] (HIVE-13164) Predicate pushdown may cause cross-product in left semi join

     [ https://issues.apache.org/jira/browse/HIVE-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chaoyu Tang resolved HIVE-13164.
--------------------------------
    Resolution: Invalid

> Predicate pushdown may cause cross-product in left semi join
> ------------------------------------------------------------
>
>                 Key: HIVE-13164
>                 URL: https://issues.apache.org/jira/browse/HIVE-13164
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>
> For some left semi join queries like followings:
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0';
> or 
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0';
> Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition.
> {code}
> LOGICAL PLAN:
> t1:t1 
>   TableScan (TS_0)
>     alias: t1
>     Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
>     Filter Operator (FIL_18)
>       predicate: (key = 0) (type: boolean)
>       Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
>       Select Operator (SEL_2)
>         Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
>         Reduce Output Operator (RS_9)
>           sort order: 
>           Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
>           Join Operator (JOIN_11)
>             condition map:
>                  Left Semi Join 0 to 1
>             keys:
>               0 
>               1 
>             Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
>             Group By Operator (GBY_13)
>               aggregations: count(1)
>               mode: hash
>               outputColumnNames: _col0
>               Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>               Reduce Output Operator (RS_14)
>                 sort order: 
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>                 value expressions: _col0 (type: bigint)
>                 Group By Operator (GBY_15)
>                   aggregations: count(VALUE._col0)
>                   mode: mergepartial
>                   outputColumnNames: _col0
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>                   File Output Operator (FS_17)
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> t2:t2 
>   TableScan (TS_3)
>     alias: t2
>     Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
>     Filter Operator (FIL_19)
>       predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
>       Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
>       Select Operator (SEL_5)
>         Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
>         Group By Operator (GBY_8)
>           keys: 'val_0' (type: string)
>           mode: hash
>           outputColumnNames: _col0
>           Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
>           Reduce Output Operator (RS_10)
>             sort order: 
>             Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
>             Join Operator (JOIN_11)
>               condition map:
>                    Left Semi Join 0 to 1
>               keys:
>                 0 
>                 1 
>               Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
> {code}
> [~gopalv], do you think these plans are valid or not? Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)