You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Nemon Lou (Jira)" <ji...@apache.org> on 2021/01/04 03:41:00 UTC
[jira] [Created] (HIVE-24579) Incorrect Result For Groupby With Limit

Nemon Lou created HIVE-24579:
--------------------------------

             Summary: Incorrect Result For Groupby With Limit
                 Key: HIVE-24579
                 URL: https://issues.apache.org/jira/browse/HIVE-24579
             Project: Hive
          Issue Type: Bug
    Affects Versions: 3.1.2, 2.3.7, 4.0.0
            Reporter: Nemon Lou




{code:sql}
create table test(id int);
explain extended select id,count(*) from test group by id limit 10;
{code}

There is an TopN unexpectly for map phase, which casues incorrect result.


{code:sql}
STAGE PLANS:
 Stage: Stage-1
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: test
 Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE
 GatherStats: false
 Select Operator
 expressions: id (type: int)
 outputColumnNames: id
 Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE
 Group By Operator
 aggregations: count()
 keys: id (type: int)
 mode: hash
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: int)
 null sort order: a
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 337 Data size: 1350 Basic stats: COMPLETE Column stats: NONE
 tag: -1
 TopN: 10
 TopN Hash Memory Usage: 0.1
 value expressions: _col1 (type: bigint)
 auto parallelism: false
 Path -> Alias:
 file:/user/hive/warehouse/test [test]
 Path -> Partition:
 file:/user/hive/warehouse/test 
 Partition
 base file name: test
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 properties:
 COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
 bucket_count -1
 column.name.delimiter ,
 columns id
 columns.comments 
 columns.types int
 file.inputformat org.apache.hadoop.mapred.TextInputFormat
 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 location file:/user/hive/warehouse/test
 name default.test
 numFiles 0
 numRows 0
 rawDataSize 0
 serialization.ddl struct test \{ i32 id}
 serialization.format 1
 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 totalSize 0
 transient_lastDdlTime 1609730036
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 
 input format: org.apache.hadoop.mapred.TextInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 properties:
 COLUMN_STATS_ACCURATE \{"BASIC_STATS":"true"}
 bucket_count -1
 column.name.delimiter ,
 columns id
 columns.comments 
 columns.types int
 file.inputformat org.apache.hadoop.mapred.TextInputFormat
 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
 location file:/user/hive/warehouse/test
 name default.test
 numFiles 0
 numRows 0
 rawDataSize 0
 serialization.ddl struct test \{ i32 id}
 serialization.format 1
 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 totalSize 0
 transient_lastDdlTime 1609730036
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 name: default.test
 name: default.test
 Truncated Path -> Alias:
 /test [test]
 Needs Tagging: false
 Reduce Operator Tree:
 Group By Operator
 aggregations: count(VALUE._col0)
 keys: KEY._col0 (type: int)
 mode: mergepartial
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 168 Data size: 672 Basic stats: COMPLETE Column stats: NONE
 Limit
 Number of rows: 10
 Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: NONE
 File Output Operator
 compressed: false
 GlobalTableId: 0
 directory: file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002
 NumFilesPerFileSink: 1
 Statistics: Num rows: 10 Data size: 40 Basic stats: COMPLETE Column stats: NONE
 Stats Publishing Key Prefix: file:/tmp/root/bd08973b-b58c-4185-9072-c1891f67878d/hive_2021-01-04_11-14-01_745_4475755683092435506-1/-mr-10001/.hive-staging_hive_2021-01-04_11-14-01_745_4475755683092435506-1/-ext-10002/
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
 properties:
 columns _col0,_col1
 columns.types int:bigint
 escape.delim \
 hive.serialization.extend.additional.nesting.levels true
 serialization.escape.crlf true
 serialization.format 1
 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
 TotalFiles: 1
 GatherStats: false
 MultiFileSpray: false

Stage: Stage-0
 Fetch Operator
 limit: 10
 Processor Tree:
 ListSink

Time taken: 1.877 seconds, Fetched: 128 row(s)
{code}






 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)