You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "XIE FAN (JIRA)" <ji...@apache.org> on 2016/12/07 15:36:58 UTC

[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in encode and decode

    [ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729040#comment-15729040 ] 

XIE FAN commented on KYLIN-1832:
--------------------------------

I will try to optimize this problem by using the method mentioned in goolge's paper: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” and add a sparse HyperLogLog counter for low cardinality columns to reduce the memory usage and time complexity.

> HyperLogLog speed is too slow in encode and decode
> --------------------------------------------------
>
>                 Key: KYLIN-1832
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1832
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Metadata
>    Affects Versions: v1.3.0, v1.5.2
>            Reporter: fengYu
>            Assignee: XIE FAN
>         Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store the value, we found it is too slow of HyperLogLogPlusCounter, there are three methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify the code to speed up the speed of three operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)