You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2020/12/31 15:44:00 UTC
[jira] [Created] (HIVE-24575) VectorGroupByOperator reusing keys
can lead to wrong results
Zhihua Deng created HIVE-24575:
----------------------------------
Summary: VectorGroupByOperator reusing keys can lead to wrong results
Key: HIVE-24575
URL: https://issues.apache.org/jira/browse/HIVE-24575
Project: Hive
Issue Type: Bug
Components: Vectorization
Reporter: Zhihua Deng
Assignee: Zhihua Deng
A common sql like
{code:java}
select category as category, count(distinct maskdid) as uv from dwd_internal_inc_d group by category{code}
can have a wrong result on the trunk, the result of column category can be confused and
aggregate of distinct maskdid is also wrong.
After some debugging, We find that the problem is caused by wrong byteStarts[i] when using it to copy the current keys to the reusable keys:
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/wrapper/VectorHashKeyWrapperGeneral.java#L351-L362]
The byteStarts[i] is always 0 due to Arrays.fill(byteStarts, 0); so it copies the range from 0 other then the real start index to len of the current keys to the reusable keys when clone.byteValues[i].length >= byteValues[i].length met, which results to the problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)