You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Daniel Lemire (JIRA)" <ji...@apache.org> on 2015/09/28 16:49:05 UTC

[jira] [Updated] (KYLIN-1034) Faster bitmap indexes with Roaring bitmaps

     [ https://issues.apache.org/jira/browse/KYLIN-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Lemire updated KYLIN-1034:
---------------------------------
    Attachment: roaring-0001.patch

We have simply substituted Roaring for Concise throughout the code. The patch is straight-forward. 

Patch corresponding to this pull request :

https://github.com/apache/incubator-kylin/pull/12

and to this commit : 

https://github.com/lemire/incubator-kylin/commit/01988012fa9c114130479ac9640957e825486769

> Faster bitmap indexes with Roaring bitmaps
> ------------------------------------------
>
>                 Key: KYLIN-1034
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1034
>             Project: Kylin
>          Issue Type: Improvement
>          Components: General
>    Affects Versions: Future
>            Reporter: Daniel Lemire
>              Labels: performance
>         Attachments: roaring-0001.patch
>
>
> Kylin is using ConciseSet for bitmap indexing. It was found that Roaring bitmaps are often much better than ConciseSet (e.g., see experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and the speed difference can be substantial. This is even more so with version 0.5 of Roaring.
> There is a high quality Java implementation that is used by Apache Spark and Druid.io. The Druid people found that switching to Roaring bitmaps could improve real-world performance by 30% or more. 
> Source code: 
> https://github.com/lemire/RoaringBitmap/
> Importing from Maven:
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/
> <dependencies>
>     <dependency>
>       <groupId>org.roaringbitmap</groupId>
>       <artifactId>RoaringBitmap</artifactId>
>       <version>[0.5,)</version>
>     </dependency>
>  </dependencies>
> Online API :
> http://lemire.me/docs/RoaringBitmap/
> JavaDoc Jar: 
> http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar
> When desired, the library supports memory file mapping, so that  out-of-JVM heap memory is used instead. This can greatly improve IO issues. The library is available under the Apache license and is patent-free.
> There is an extensive real-data benchmark framework which you can run for yourself to compare Roaring with competitive alternatives such as ConciseSet :
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
> Running such a benchmark can be as simple as launching a script.
> For Druid, the bitmap format was made "pluggable" so you can switch from one format to the other using a configuration flag. This is implemented through simple wrappers, e.g., see
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> So it can be really easy to make it possible to switch the format while preserving backward compatibility if needed... 
> It is probably not difficult work to integrate Roaring in Kylin (maybe a day or so of programming) and it could potentially improve performance while reducing memory storage.
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have their own implementation, see 
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)