You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Harper Koo (JIRA)" <ji...@apache.org> on 2016/03/29 10:11:25 UTC

[jira] [Created] (HIVE-13375) Cannot get the right count(*) through 'group by'

Harper Koo created HIVE-13375:
---------------------------------

             Summary: Cannot get the right count(*) through 'group by'
                 Key: HIVE-13375
                 URL: https://issues.apache.org/jira/browse/HIVE-13375
             Project: Hive
          Issue Type: Bug
          Components: Beeline
    Affects Versions: 1.1.0
         Environment: hive 1.1 under cloudera 5.4.0
            Reporter: Harper Koo


I have a query looks like:
select
    b.rd
    ,b.type
    ,null as pv
    ,count(*) as uv
from(
select rd
       ,type
       ,user
from (select rd
             ,case when url like '%abc' then 'A'
                   when url like '%abc1' then 'B'
                   when url like '%abc2' then 'C'
              end as type
             ,nvl(id,guid) as user
       from tableA pq
       where rd='2016-03-08'
       and url like '%abc%'
      )t
group by rd,type,user 
order by rd
)b
group by b.rd,b.type;

I want to get the unique 'user' from tableA without using count(distinct()), but the result surprised me!
the result looks like:
2016-03-08 A NULL 245
2016-03-08 B NULL 53
2016-03-08 C NULL 6
2016-03-08 A NULL 543
2016-03-08 B NULL 456
2016-03-08 C NULL 25
2016-03-08 A NULL 544
2016-03-08 B NULL 65
2016-03-08 C NULL 34
……
It is quite long and does not group by rd and type at all.

I run the same query in impala and it works well there and gives me the right answer.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)