You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Krisztian Kasa (Jira)" <ji...@apache.org> on 2021/09/09 06:44:00 UTC

[jira] [Resolved] (HIVE-25498) Query with more than 31 count distinct functions returns wrong result

     [ https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Krisztian Kasa resolved HIVE-25498.
-----------------------------------
    Resolution: Fixed

Pushed to master. Thanks [~robbiezhang] for the fix and [~pgaref] for review.

> Query with more than 31 count distinct functions returns wrong result
> ---------------------------------------------------------------------
>
>                 Key: HIVE-25498
>                 URL: https://issues.apache.org/jira/browse/HIVE-25498
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Robbie Zhang
>            Assignee: Robbie Zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, some or even all these COUNT functions in this query return 0 instead of the proper values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4 string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string, c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17 string, c18 string, c19 string, c20 string, c21 string, c22 string, c23 string, c24 string, c25 string, c26 string, c27 string, c28 string, c29 string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28', 'c29', 'c30', 'c31', 'c32'); 
> select count (distinct c0), count(distinct c1), count(distinct c2), count(distinct c3), count(distinct c4), count(distinct c5), count(distinct c6), count(distinct c7), count(distinct c8), count(distinct c9), count(distinct c10), count(distinct c11), count(distinct c12), count(distinct c13), count(distinct c14), count(distinct c15), count(distinct c16), count(distinct c17), count(distinct c18), count(distinct c19), count(distinct c20), count(distinct c21), count(distinct c22), count(distinct c23), count(distinct c24), count(distinct c25), count(distinct c26), count(distinct c27), count(distinct c28), count(distinct c29), count(distinct c30), count(distinct c31), count(distinct c32) from test_count;
> {code}
>  This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue() which uses int type. When there are more than 32 groupings the values overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)