You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "XIE FAN (JIRA)" <ji...@apache.org> on 2016/12/07 15:36:58 UTC
[jira] [Commented] (KYLIN-1832) HyperLogLog speed is too slow in
encode and decode
[ https://issues.apache.org/jira/browse/KYLIN-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15729040#comment-15729040 ]
XIE FAN commented on KYLIN-1832:
--------------------------------
I will try to optimize this problem by using the method mentioned in goolge's paper: “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” and add a sparse HyperLogLog counter for low cardinality columns to reduce the memory usage and time complexity.
> HyperLogLog speed is too slow in encode and decode
> --------------------------------------------------
>
> Key: KYLIN-1832
> URL: https://issues.apache.org/jira/browse/KYLIN-1832
> Project: Kylin
> Issue Type: Improvement
> Components: Metadata
> Affects Versions: v1.3.0, v1.5.2
> Reporter: fengYu
> Assignee: XIE FAN
> Attachments: HyperLogLogPlusCounter.java
>
>
> We have a cube with more than ten distinct count measure, and use hll15 store the value, we found it is too slow of HyperLogLogPlusCounter, there are three methods will called frequentlly: merge/writeRegisters/readRegisters.
> I found in kylin-1.5.x add a parameter 'singleBucket' to store the only one bucket which can optimize base cuboid.
> However, in other step of cuboid building, it will slow down. I has modify the code to speed up the speed of three operation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)