You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2016/08/05 17:52:20 UTC
[jira] [Created] (HIVE-14442) CBO: Calcite Operator To Hive
Operator(Calcite Return Path): Wrong result/plan in group by with
hive.map.aggr=false
Vineet Garg created HIVE-14442:
----------------------------------
Summary: CBO: Calcite Operator To Hive Operator(Calcite Return Path): Wrong result/plan in group by with hive.map.aggr=false
Key: HIVE-14442
URL: https://issues.apache.org/jira/browse/HIVE-14442
Project: Hive
Issue Type: Sub-task
Components: CBO
Reporter: Vineet Garg
Assignee: Vineet Garg
Reproducer
{code} set hive.cbo.returnpath.hiveop=true {code}
{code} set hive.map.aggr=false {code}
{code}
create table abcd (a int, b int, c int, d int);
LOAD DATA LOCAL INPATH '../../data/files/in4.txt' INTO TABLE abcd;
{code}
{code} explain select count(distinct a) from abcd group by b; {code}
{code}
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: a (type: int)
outputColumnNames: a
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
aggregations: count(DISTINCT KEY._col1:0._col0)
keys: KEY._col0 (type: int)
mode: complete
outputColumnNames: b, $f1
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code} explain select count(distinct a) from abcd group by c; {code}
{code}
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: abcd
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: a (type: int)
outputColumnNames: a
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: a (type: int), a (type: int)
sort order: ++
Map-reduce partition columns: a (type: int)
Statistics: Num rows: 19 Data size: 78 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Group By Operator
aggregations: count(DISTINCT KEY._col1:0._col0)
keys: KEY._col0 (type: int)
mode: complete
outputColumnNames: c, $f1
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: $f1 (type: bigint)
outputColumnNames: _o__c0
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 9 Data size: 36 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
{code}
Above two cases has wrong keys in Map side Reduce Output Operator (both has a, a instead of b,a and c,a respectively
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)