You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "ericni (JIRA)" <ji...@apache.org> on 2014/10/24 08:31:33 UTC
[jira] [Created] (HIVE-8590) With different parameters or column
number dense_rank function gets different count distinct results
ericni created HIVE-8590:
----------------------------
Summary: With different parameters or column number dense_rank function gets different count distinct results
Key: HIVE-8590
URL: https://issues.apache.org/jira/browse/HIVE-8590
Project: Hive
Issue Type: Bug
Components: UDF
Affects Versions: 0.13.1
Environment: cdh 4.6.0/hive0.13
Reporter: ericni
We create a table with sql which contains the dense_rank function,and then run count distinct on this table,
we found that with diffrent dense_rank parameters or even defferent columns,we will get the defferent count distinct results:
1.Less data will be ok(in our test case,200 million rows will get the same results,but 300 million rows will get the different results )
2.Different dense_rank parameters may be get the different results ,e.g "dense_rank() over(distribute by a,b sort by c desc)" and "dense_rank() over(distribute by a sort by c desc)"
3.All window functions(rank,row_number,dense_rank) have this problem
4.Less column number may be ok
5.Count(1) is ok,but Count distinct gets different results
6.It seems that some rows have been lost and some rows repeated
test data(File is too large to upload.):
http://pan.baidu.com/s/1hqnCzze
test sql:
http://pan.baidu.com/s/1eQna8q2
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)