You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2020/12/31 15:44:00 UTC

[jira] [Created] (HIVE-24575) VectorGroupByOperator reusing keys can lead to wrong results

Zhihua Deng created HIVE-24575:
----------------------------------

             Summary: VectorGroupByOperator reusing keys can lead to wrong results
                 Key: HIVE-24575
                 URL: https://issues.apache.org/jira/browse/HIVE-24575
             Project: Hive
          Issue Type: Bug
          Components: Vectorization
            Reporter: Zhihua Deng
            Assignee: Zhihua Deng


 A common sql like
{code:java}
select category as category, count(distinct maskdid) as uv from dwd_internal_inc_d group by category{code}
can have a wrong result on the trunk,  the result of column category can be confused and
aggregate of distinct maskdid is also wrong. 
After some debugging, We find that the problem is caused by wrong byteStarts[i] when using it to copy the current keys to the reusable keys: 
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362]
The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies the range from 0 other then the real start index to len of the current keys to the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, which results to the problem.
 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)