You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "PENG Zhengshuai (JIRA)" <ji...@apache.org> on 2019/07/15 12:29:00 UTC
[jira] [Updated] (KYLIN-4083) Fact Distinct Column Step, UHC column
may loose value when the hashcode of value is Integer.MIN_VALUE
[ https://issues.apache.org/jira/browse/KYLIN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
PENG Zhengshuai updated KYLIN-4083:
-----------------------------------
Description:
In the Fact Distinct Column Step, kylin uses MR to de-dup the values of columns.
If the column is UHC (ultra high cardinality) column and the value of the property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*, the Mapper task will write the output of UHC column values to different reducers by *FactDistinctColumnPartitioner*
The reducer id will be calculated by hash, the implementation in *FactDistinctColumnsReducerMapping#getReducerIdForCol *, in this method, *the reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*
When the value.hashCode() is Integer.MIN_VALUE, the reducer id may return a negative value. This may cause the FactDistinctColumn step failed, or the UHC column value may be redirected to another reducer which not belongs to UHC column
was:
In the Fact Distinct Column Step, kylin uses MR to reduce the values of columns.
If the column is UHC (ultra high cardinality) column and the value of the property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*, the Mapper task will write the output of UHC column values to different reducers by *FactDistinctColumnPartitioner*
The reducer id will be calculated by hash, the implementation in *FactDistinctColumnsReducerMapping#getReducerIdForCol *, in this method, *the reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*
When the value.hashCode() is Integer.MIN_VALUE, the reducer id may return a negative value. This may cause the FactDistinctColumn step failed, or the UHC column value may be redirected to another reducer which not belongs to UHC column
> Fact Distinct Column Step, UHC column may loose value when the hashcode of value is Integer.MIN_VALUE
> -----------------------------------------------------------------------------------------------------
>
> Key: KYLIN-4083
> URL: https://issues.apache.org/jira/browse/KYLIN-4083
> Project: Kylin
> Issue Type: Bug
> Reporter: PENG Zhengshuai
> Assignee: PENG Zhengshuai
> Priority: Major
>
> In the Fact Distinct Column Step, kylin uses MR to de-dup the values of columns.
> If the column is UHC (ultra high cardinality) column and the value of the property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*, the Mapper task will write the output of UHC column values to different reducers by *FactDistinctColumnPartitioner*
> The reducer id will be calculated by hash, the implementation in *FactDistinctColumnsReducerMapping#getReducerIdForCol *, in this method, *the reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*
> When the value.hashCode() is Integer.MIN_VALUE, the reducer id may return a negative value. This may cause the FactDistinctColumn step failed, or the UHC column value may be redirected to another reducer which not belongs to UHC column
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)