You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "guangdong (JIRA)" <ji...@apache.org> on 2018/08/01 13:36:00 UTC

[jira] [Created] (HIVE-20289) The function of row_number have different result

guangdong created HIVE-20289:
--------------------------------

             Summary: The function of row_number have different result 
                 Key: HIVE-20289
                 URL: https://issues.apache.org/jira/browse/HIVE-20289
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.3.0
         Environment: hive 2.3.3

hadoop 2.7.6
            Reporter: guangdong
             Fix For: 2.3.0


1. Create table like this:

create table src(
    name string
    ,buy_time string
    ,consumption int );

2.Then insert data:
insert into src values('zzz','2018-08-01',20),('zzz','2018-08-01',10);

3.When i execute sql in hive 2.3.3. The result is :
hive> select consumption, row_number() over(distribute by name sort by buy_time desc) from src;
Query ID = dwetl_20180801210808_692d5d70-a136-4525-9cdb-b6269e6c3069
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1531984581474_944267, Tracking URL = http://hadoop-jr-nn02.pekdc1.jdfin.local:8088/proxy/application_1531984581474_944267/
Kill Command = /soft/hadoop/bin/hadoop job  -kill job_1531984581474_944267
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2018-08-01 21:09:08,855 Stage-1 map = 0%,  reduce = 0%
2018-08-01 21:09:16,026 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.12 sec
2018-08-01 21:09:22,210 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.09 sec
MapReduce Total cumulative CPU time: 4 seconds 90 msec
Ended Job = job_1531984581474_944267
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 4.09 sec   HDFS Read: 437 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 90 msec
OK
20    1
10    2
Time taken: 80.135 seconds, Fetched: 2 row(s)

4.When i execute sql in hive 0.14. The result is :
> select consumption, row_number() over(distribute by name sort by buy_time desc) from src;
Query ID = dwetl_20180801212222_7812d9f0-328d-4125-ba99-0f577f4cca9a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1531984581474_944597, Tracking URL = http://hadoop-jr-nn02.pekdc1.jdfin.local:8088/proxy/application_1531984581474_944597/
Kill Command = /soft/hadoop/bin/hadoop job  -kill job_1531984581474_944597
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-08-01 21:22:26,467 Stage-1 map = 0%,  reduce = 0%
2018-08-01 21:22:34,839 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13 sec
2018-08-01 21:22:40,984 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.28 sec
MapReduce Total cumulative CPU time: 3 seconds 280 msec
Ended Job = job_1531984581474_944597
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.28 sec   HDFS Read: 233 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 280 msec
OK

I hope have the common result . How could i can do?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)