You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Harper Koo (JIRA)" <ji...@apache.org> on 2016/03/29 10:11:25 UTC
[jira] [Created] (HIVE-13375) Cannot get the right count(*) through
'group by'
Harper Koo created HIVE-13375:
---------------------------------
Summary: Cannot get the right count(*) through 'group by'
Key: HIVE-13375
URL: https://issues.apache.org/jira/browse/HIVE-13375
Project: Hive
Issue Type: Bug
Components: Beeline
Affects Versions: 1.1.0
Environment: hive 1.1 under cloudera 5.4.0
Reporter: Harper Koo
I have a query looks like:
select
b.rd
,b.type
,null as pv
,count(*) as uv
from(
select rd
,type
,user
from (select rd
,case when url like '%abc' then 'A'
when url like '%abc1' then 'B'
when url like '%abc2' then 'C'
end as type
,nvl(id,guid) as user
from tableA pq
where rd='2016-03-08'
and url like '%abc%'
)t
group by rd,type,user
order by rd
)b
group by b.rd,b.type;
I want to get the unique 'user' from tableA without using count(distinct()), but the result surprised me!
the result looks like:
2016-03-08 A NULL 245
2016-03-08 B NULL 53
2016-03-08 C NULL 6
2016-03-08 A NULL 543
2016-03-08 B NULL 456
2016-03-08 C NULL 25
2016-03-08 A NULL 544
2016-03-08 B NULL 65
2016-03-08 C NULL 34
……
It is quite long and does not group by rd and type at all.
I run the same query in impala and it works well there and gives me the right answer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)