You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "hongbin ma (JIRA)" <ji...@apache.org> on 2016/04/26 13:20:12 UTC
[jira] [Created] (KYLIN-1624) HyperLogLogPlusCounter will become
inaccurate when there're billions of entries
hongbin ma created KYLIN-1624:
---------------------------------
Summary: HyperLogLogPlusCounter will become inaccurate when there're billions of entries
Key: KYLIN-1624
URL: https://issues.apache.org/jira/browse/KYLIN-1624
Project: Kylin
Issue Type: Improvement
Reporter: hongbin ma
Assignee: liyang
final List<HyperLogLogPlusCounter> counters = Lists.newArrayList();
ExecutorService service = Executors.newFixedThreadPool(20);
final CountDownLatch latch = new CountDownLatch(20);
for (int i = 0; i < 20; i++) {
service.submit(new Runnable() {
@Override
public void run() {
Random rand = new Random();
HyperLogLogPlusCounter counter = new HyperLogLogPlusCounter(14);
for (long j = 0; j < 500000000; j++) {
if (j % 1000000 == 1) {
System.out.println(j);
}
counter.add("" + rand.nextLong());
}
System.out.println("final" + counter.getCountEstimate());
counters.add(counter);
latch.countDown();
}
});
}
latch.await();
System.out.println("latch done");
HyperLogLogPlusCounter ret = new HyperLogLogPlusCounter(14);
for (HyperLogLogPlusCounter c : counters) {
ret.merge(c);
}
System.out.println(ret.getCountEstimate());
expected output is 10B however the output can be less than 1B
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)