You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "PENG Zhengshuai (JIRA)" <ji...@apache.org> on 2019/07/15 12:28:00 UTC
[jira] [Created] (KYLIN-4083) Fact Distinct Column Step, UHC column
may loose value when the hashcode of value is Integer.MIN_VALUE
PENG Zhengshuai created KYLIN-4083:
--------------------------------------
Summary: Fact Distinct Column Step, UHC column may loose value when the hashcode of value is Integer.MIN_VALUE
Key: KYLIN-4083
URL: https://issues.apache.org/jira/browse/KYLIN-4083
Project: Kylin
Issue Type: Bug
Reporter: PENG Zhengshuai
Assignee: PENG Zhengshuai
In the Fact Distinct Column Step, kylin uses MR to reduce the values of columns.
If the column is UHC (ultra high cardinality) column and the value of the property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*, the Mapper task will write the output of UHC column values to different reducers by *FactDistinctColumnPartitioner*
The reducer id will be calculated by hash, the implementation in *FactDistinctColumnsReducerMapping#getReducerIdForCol *, in this method, *the reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*
When the value.hashCode() is Integer.MIN_VALUE, the reducer id may return a negative value. This may cause the FactDistinctColumn step failed, or the UHC column value may be redirected to another reducer which not belongs to UHC column
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)