You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Daniel Lemire (JIRA)" <ji...@apache.org> on 2015/09/28 16:49:05 UTC
[jira] [Updated] (KYLIN-1034) Faster bitmap indexes with Roaring
bitmaps
[ https://issues.apache.org/jira/browse/KYLIN-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Lemire updated KYLIN-1034:
---------------------------------
Attachment: roaring-0001.patch
We have simply substituted Roaring for Concise throughout the code. The patch is straight-forward.
Patch corresponding to this pull request :
https://github.com/apache/incubator-kylin/pull/12
and to this commit :
https://github.com/lemire/incubator-kylin/commit/01988012fa9c114130479ac9640957e825486769
> Faster bitmap indexes with Roaring bitmaps
> ------------------------------------------
>
> Key: KYLIN-1034
> URL: https://issues.apache.org/jira/browse/KYLIN-1034
> Project: Kylin
> Issue Type: Improvement
> Components: General
> Affects Versions: Future
> Reporter: Daniel Lemire
> Labels: performance
> Attachments: roaring-0001.patch
>
>
> Kylin is using ConciseSet for bitmap indexing. It was found that Roaring bitmaps are often much better than ConciseSet (e.g., see experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and the speed difference can be substantial. This is even more so with version 0.5 of Roaring.
> There is a high quality Java implementation that is used by Apache Spark and Druid.io. The Druid people found that switching to Roaring bitmaps could improve real-world performance by 30% or more.
> Source code:
> https://github.com/lemire/RoaringBitmap/
> Importing from Maven:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/
> <dependencies>
> <dependency>
> <groupId>org.roaringbitmap</groupId>
> <artifactId>RoaringBitmap</artifactId>
> <version>[0.5,)</version>
> </dependency>
> </dependencies>
> Online API :
> http://lemire.me/docs/RoaringBitmap/
> JavaDoc Jar:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar
> When desired, the library supports memory file mapping, so that out-of-JVM heap memory is used instead. This can greatly improve IO issues. The library is available under the Apache license and is patent-free.
> There is an extensive real-data benchmark framework which you can run for yourself to compare Roaring with competitive alternatives such as ConciseSet :
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
> Running such a benchmark can be as simple as launching a script.
> For Druid, the bitmap format was made "pluggable" so you can switch from one format to the other using a configuration flag. This is implemented through simple wrappers, e.g., see
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> So it can be really easy to make it possible to switch the format while preserving backward compatibility if needed...
> It is probably not difficult work to integrate Roaring in Kylin (maybe a day or so of programming) and it could potentially improve performance while reducing memory storage.
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have their own implementation, see
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)