You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "Billy Liu (JIRA)" <ji...@apache.org> on 2017/04/10 14:59:41 UTC

[jira] [Resolved] (KYLIN-2518) Improve the sampling performance of FactDistinctColumns step

     [ https://issues.apache.org/jira/browse/KYLIN-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billy Liu resolved KYLIN-2518.
------------------------------
       Resolution: Fixed
    Fix Version/s: v2.0.0

https://github.com/apache/kylin/commit/4c21821471cb261cfecdf8289c5f8284af817b3e

> Improve the sampling performance of FactDistinctColumns step
> ------------------------------------------------------------
>
>                 Key: KYLIN-2518
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2518
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: XIE FAN
>            Assignee: XIE FAN
>             Fix For: v2.0.0
>
>
> The method putRowKeyToHLL() in FactDistinctColumnsMapper can be very slow when sampling rate is high. After carefully profiling, we believe that it's performance can be improved by modifying it's hash method. At the same time, we also found an algorithm that can estimate the row nums of  each cuboid accurately with a lower sampling rate. I will share more test results and details of the algorithm once after this issue is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)