You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "Dong Li (JIRA)" <ji...@apache.org> on 2017/12/13 13:12:00 UTC

[jira] [Comment Edited] (KYLIN-2866) Enlarge the reducer number for hyperloglog statistics calculation at step FactDistinctColumnsJob

    [ https://issues.apache.org/jira/browse/KYLIN-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289234#comment-16289234 ] 

Dong Li edited comment on KYLIN-2866 at 12/13/17 1:11 PM:
----------------------------------------------------------

Hi [~yaho], in this patch, I found several duplicated code between org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and org.apache.kylin.engine.mr.common.CubeStatsReader, which will make the code more complex. Could you make some refine?

Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, which seems to calculate reducer number. What does the "HLLShardBase" mean here?


was (Author: lidong_sjtu):
Hi [~yaho], in this patch, I found several duplicated code between org.apache.kylin.engine.mr.steps.SaveStatisticsStep#doWork and org.apache.kylin.engine.mr.common.CubeStatsReader, which make code complex. Could you make some refine?

Besides, in org.apache.kylin.engine.mr.common.MapReduceUtil#getHLLShardBase, which seems to calculate reducer number. What does the "HLLShardBase" mean here?

> Enlarge the reducer number for hyperloglog statistics calculation at step FactDistinctColumnsJob
> ------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-2866
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2866
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>             Fix For: v2.3.0
>
>         Attachments: APACHE-KYLIN-2866.patch
>
>
> Currently only one reducer is assigned for hll stats calculation, which may become the bottleneck for slow down this step. Since the stats for different cuboids will not influence each other, it's better to divide the cuboid set into several and assign a reduce for each subset.
> The strategy of this patch is to assign 100 cuboids into a subset. And there's a upper limit of reducers for hll stats calculation. Currently it's 50.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)