You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2016/02/26 04:53:18 UTC
[jira] [Resolved] (HIVE-13164) Predicate pushdown may cause
cross-product in left semi join
[ https://issues.apache.org/jira/browse/HIVE-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chaoyu Tang resolved HIVE-13164.
--------------------------------
Resolution: Invalid
> Predicate pushdown may cause cross-product in left semi join
> ------------------------------------------------------------
>
> Key: HIVE-13164
> URL: https://issues.apache.org/jira/browse/HIVE-13164
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
>
> For some left semi join queries like followings:
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0';
> or
> select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0';
> Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition.
> {code}
> LOGICAL PLAN:
> t1:t1
> TableScan (TS_0)
> alias: t1
> Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
> Filter Operator (FIL_18)
> predicate: (key = 0) (type: boolean)
> Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
> Select Operator (SEL_2)
> Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator (RS_9)
> sort order:
> Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
> Join Operator (JOIN_11)
> condition map:
> Left Semi Join 0 to 1
> keys:
> 0
> 1
> Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
> Group By Operator (GBY_13)
> aggregations: count(1)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator (RS_14)
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: bigint)
> Group By Operator (GBY_15)
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> File Output Operator (FS_17)
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> t2:t2
> TableScan (TS_3)
> alias: t2
> Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
> Filter Operator (FIL_19)
> predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
> Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
> Select Operator (SEL_5)
> Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
> Group By Operator (GBY_8)
> keys: 'val_0' (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator (RS_10)
> sort order:
> Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
> Join Operator (JOIN_11)
> condition map:
> Left Semi Join 0 to 1
> keys:
> 0
> 1
> Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
> {code}
> [~gopalv], do you think these plans are valid or not? Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)