You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Prashant Prakash <pr...@gmail.com> on 2016/02/03 12:02:56 UTC

Count distinct support in Kylin

Hi All,

We are using kylin 1.1.1. For a measure of type count(distinct) we
implemented two cubes on two different instances of kylin while using same
underlying hive table.
In setup1 we used hll12 for the measure and on setup2 we used hll16.

The % diff from actual value (as measured against hive table) in case of
hll12 is  ~ 50%.

daily uniques hive table: 98963950
daily uniques  (hllc16): 99012410
daily uniques (hllc12): 49327300

We tested the numbers across multiple days. In case of hll12 we observed
that in more than 30 percent cases the diff is 50 %.

Is there any limit on distinct values after which the performance of
HyperLogLogPlusCounter degrades ? or are there any issues reported with the
implementation ?

Regards
Prashant