You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2016/02/26 04:30:18 UTC
[jira] [Created] (HIVE-13164) Predicate pushdown may cause
cross-product in left semi join
Chaoyu Tang created HIVE-13164:
----------------------------------
Summary: Predicate pushdown may cause cross-product in left semi join
Key: HIVE-13164
URL: https://issues.apache.org/jira/browse/HIVE-13164
Project: Hive
Issue Type: Bug
Components: Query Processor
Reporter: Chaoyu Tang
Assignee: Chaoyu Tang
For some left semi join queries like followings:
select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0';
or
select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0';
Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition.
{code}
LOGICAL PLAN:
t1:t1
TableScan (TS_0)
alias: t1
Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator (FIL_18)
predicate: (key = 0) (type: boolean)
Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
Select Operator (SEL_2)
Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator (RS_9)
sort order:
Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
Join Operator (JOIN_11)
condition map:
Left Semi Join 0 to 1
keys:
0
1
Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
Group By Operator (GBY_13)
aggregations: count(1)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator (RS_14)
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
Group By Operator (GBY_15)
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator (FS_17)
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
t2:t2
TableScan (TS_3)
alias: t2
Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
Filter Operator (FIL_19)
predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
Select Operator (SEL_5)
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
Group By Operator (GBY_8)
keys: 'val_0' (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator (RS_10)
sort order:
Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
Join Operator (JOIN_11)
condition map:
Left Semi Join 0 to 1
keys:
0
1
Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
{code}
[~gopalv], do you think these plans are valid or not? Thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)