You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Yifei Wu (JIRA)" <ji...@apache.org> on 2017/12/02 15:59:00 UTC

[jira] [Commented] (KYLIN-3078) the estimated size of percentile measure is too big

    [ https://issues.apache.org/jira/browse/KYLIN-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275628#comment-16275628 ] 

Yifei Wu commented on KYLIN-3078:
---------------------------------

the key is to clarify the percentile impact on cube size estimate and find a more proper way to estimate the size of percentile measure.
For the measure use the T-digest Algorithm to realize it, so it can conclude some regular pattern by the analysis from the T-digest paper and the statistics collected in the local test.



> the estimated size of percentile measure  is too big
> ----------------------------------------------------
>
>                 Key: KYLIN-3078
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3078
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Yifei Wu
>            Assignee: Yifei Wu
>            Priority: Critical
>
> To set a shard number that will be for controlling the size per shard properly, we need to estimate cube size through accumulating all dimension and measure size roughly before building a cube. But the way of calculating the percentile measure is inaccurate currently and cause too many partitions for cube storage. Furthermore, it may affect the performance of SQL query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)