You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "mingren (JIRA)" <ji...@apache.org> on 2016/01/04 06:58:39 UTC
[jira] [Commented] (HIVE-12760) union all in hive returns incorrect
results.
[ https://issues.apache.org/jira/browse/HIVE-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080728#comment-15080728 ]
mingren commented on HIVE-12760:
--------------------------------
I have done the work as you've said. I found commit 4a0784a50316beae970d74a062014dec44a0dd0f works fine.
still I can't find out the exact patch! there are too many differences between current code and Hive 1.2.1 release code
> union all in hive returns incorrect results.
> --------------------------------------------
>
> Key: HIVE-12760
> URL: https://issues.apache.org/jira/browse/HIVE-12760
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 1.2.1
> Environment: Hadoop 2.7.1
> Hive 1.2.1
> Reporter: mingren
>
> The issue can be recreated with following steps
> 1.create table
> {quote}
> CREATE TABLE `union_case`(
> `shopid` bigint,
> `platform` string,
> `source` string,
> `pv` bigint,
> `uv` bigint,
> `pv_all` bigint,
> `uv_all` bigint,
> `pv_rate` float,
> `uv_rate` float
> )
> PARTITIONED BY (
> `visit_date` string)
> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> LOCATION
> 'hdfs://mgjcluster/apps/hive/warehouse/union_case'
> {quote}
> 2.insert the following data
> {quote}
> insert into table union_case partition(visit_date='2015-12-24') values
> (102448,'wap','t1',2,1,10,10,0.5,0.5),
> (102448,'pc','t1',2,1,10,10,0.5,0.5),
> (102448,'app','t1',2,1,10,10,0.5,0.5),
> (102443,'wap','t2',2,1,10,10,0.5,0.5),
> (102443,'pc','t2',2,1,10,10,0.5,0.5),
> (102443,'app','t2',2,1,10,10,0.5,0.5)
> {quote}
> 3.execute the following query
> {quote}
> select platform,count(1) from
> (select
> shopid,
> 1 as platform,
> source,
> pv,
> uv,
> pv_all,
> uv_all,
> pv_rate,
> uv_rate
> from union_case
> where visit_date = '2015-12-24' and platform = 'pc'
> union all
> select
> a.shopid,
> 2 as platform,
> a.source,
> a.pv as pv,
> a.uv as uv,
> b.pv_all as pv_all,
> b.uv_all as uv_all,
> a.pv/b.pv_all as pv_rate,
> a.uv/b.uv_all as uv_rate
> from (
> select a.shopid,a.source,sum(a.pv) as pv,sum(a.uv) as uv
> from union_case a
> where a.visit_date = '2015-12-24' and platform in ('wap','app')
> group by a.shopid, a.source) a
> left join (
> select
> shopid,
> sum(pv) as pv_all,
> sum(uv) as uv_all
> from union_case
> where visit_date = '2015-12-24' and platform in ('wap','app')
> group by shopid
> ) b
> on a.shopid = b.shopid
> ) x
> group by platform
> limit 5
> {quote}
> 4. output of quering
> {quote}
> NULL 2
> 2 2
> {quote}
> **obviously, result is wrong.**
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)