You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Harper Koo (JIRA)" <ji...@apache.org> on 2016/03/29 10:31:25 UTC
[jira] [Updated] (HIVE-13375) Cannot get the right count(*) through
'group by'
[ https://issues.apache.org/jira/browse/HIVE-13375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harper Koo updated HIVE-13375:
------------------------------
Attachment: explain.txt
> Cannot get the right count(*) through 'group by'
> ------------------------------------------------
>
> Key: HIVE-13375
> URL: https://issues.apache.org/jira/browse/HIVE-13375
> Project: Hive
> Issue Type: Bug
> Components: Beeline
> Affects Versions: 1.1.0
> Environment: hive 1.1 under cloudera 5.4.0
> Reporter: Harper Koo
> Attachments: explain.txt
>
>
> I have a query looks like:
> select
> b.rd
> ,b.type
> ,null as pv
> ,count(*) as uv
> from(
> select rd
> ,type
> ,user
> from (select rd
> ,case when url like '%abc' then 'A'
> when url like '%abc1' then 'B'
> when url like '%abc2' then 'C'
> end as type
> ,nvl(id,guid) as user
> from tableA pq
> where rd='2016-03-08'
> and url like '%abc%'
> )t
> group by rd,type,user
> order by rd
> )b
> group by b.rd,b.type;
> I want to get the unique 'user' from tableA without using count(distinct()), but the result surprised me!
> the result looks like:
> 2016-03-08 A NULL 245
> 2016-03-08 B NULL 53
> 2016-03-08 C NULL 6
> 2016-03-08 A NULL 543
> 2016-03-08 B NULL 456
> 2016-03-08 C NULL 25
> 2016-03-08 A NULL 544
> 2016-03-08 B NULL 65
> 2016-03-08 C NULL 34
> ……
> It is quite long and does not group by rd and type at all.
> I run the same query in impala and it works well there and gives me the right answer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)