You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "dima machlin (JIRA)" <ji...@apache.org> on 2014/05/12 16:10:14 UTC
[jira] [Created] (HIVE-7045) Wrong results in multi-table insert
aggregating without group by clause
dima machlin created HIVE-7045:
----------------------------------
Summary: Wrong results in multi-table insert aggregating without group by clause
Key: HIVE-7045
URL: https://issues.apache.org/jira/browse/HIVE-7045
Project: Hive
Issue Type: Bug
Affects Versions: 0.12.0, 0.10.0
Reporter: dima machlin
The scenario :
CREATE TABLE t1 (a int, b int);
CREATE TABLE t2 (cnt int) PARTITIONED BY (var_name string);
insert into table t1 select 1,1 from asd limit 1;
insert into table t1 select 2,2 from asd limit 1;
from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt ;
select * from t2;
returns :
2 a
2 b
as expected.
Setting the number of reducers higher than 1 :
set mapred.reduce.tasks=2;
from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt
insert overwrite table t2 partition(var_name='b') select count(b) cnt;
select * from t2;
1 a
1 a
1 b
1 b
Wrong results.
This happens when ever t1 is big enough to automatically generate more than 1 reducers and without specifying it directly.
adding "group by 1" in the end of each insert solves the problem :
from t1
insert overwrite table t2 partition(var_name='a') select count(a) cnt group by 1
insert overwrite table t2 partition(var_name='b') select count(b) cnt group by 1;
generates :
2 a
2 b
This should work without the group by...
The number of rows for each partition will be the amount of reducers.
Each reducer calculated a sub total of the count.
--
This message was sent by Atlassian JIRA
(v6.2#6252)