You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Daniel Lemire <le...@gmail.com> on 2015/09/17 17:16:26 UTC

Faster bitmap indexes with Roaring bitmaps

Good day,

I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we
found that Roaring bitmaps are often much better than ConciseSet (e.g., see
experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
compression is often better and the speed difference can be substantial.
This is even more so with version 0.5 of Roaring.

We have a high quality Java implementation that is used by Apache Spark and
Druid.io. The Druid people found that switching to Roaring bitmaps could
improve real-world performance by 30% or more.

https://github.com/lemire/RoaringBitmap/

When desired, the library supports memory file mapping, so that  out-of-JVM
heap memory is used instead. This can greatly improve IO issues. The
library is available under the Apache license and is patent-free.

We have an extensive real-data benchmark framework which you can run for
yourself to compare Roaring with competitive alternatives such as
ConciseSet :

https://github.com/lemire/RoaringBitmap/tree/master/jmh

Running such a benchmark can be as simple as launching a script.

What we did for Druid is to make the bitmap format "pluggable" so you can
switch from one format to the other using a configuration flag. This is
implemented through simple wrappers, e.g., see

https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap

So it can be really easy to make it possible to switch the format while
preserving backward compatibility if needed...

If you are interested in trying out Roaring, please let us know... We do
not think it is difficult work to integrate Roaring in Kylin (maybe a day
or so of programming) and it could potentially improve performance while
reducing memory storage.

Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
their own implementation, see
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

--
 Daniel Lemire for the Roaring team
https://github.com/lemire/

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
Thanks Julian. I'm an even greater fan of your work.






On Sat, Sep 19, 2015 at 7:35 PM, Julian Hyde <jh...@apache.org> wrote:

> Thanks for the suggestion, Daniel. I'm an admirer of your work.
>
> Julian
>
> On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <le...@gmail.com> wrote:
> > Good day,
> >
> > I posted an issue on Kylin JIRA :
> >
> > https://issues.apache.org/jira/browse/KYLIN-1034
> >
> > It is not very hard to replace ConciseSet by a RoaringBitmap... there are
> > API differences, but it all comes down to computing unions,
> differences...
> > and so forth. There are no fundamental differences. I am willing to
> > contribute code and help in other ways.
> >
> > There are architectural issues that might be interesting to discuss
> before
> > any code gets written. For example, are your bitmaps stored on disk, or
> are
> > they strictly in-memory? If they are stored on disk and immutable, then I
> > suggest that it might be interesting to consider memory mapping. This
> > avoids the expensive process of loading up the data. I see that you seem
> to
> > bring back bitmaps from bytes...
> >
> > Memory-mapped bitmaps are slightly slower because they go through the
> > abstraction of a ByteBuffer, but saving on the IO and data manipulation
> is
> > a big deal too.
> >
> >
> >
> >
> > On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <lu...@gmail.com> wrote:
> >
> >> Hi Daniel,
> >>     Thank you very much to let's know Roaring bitmaps, it looks
> promising
> >> with better performance.
> >>     We are really would like to have it in our project, as you said the
> >> efforts should not too much,
> >> May I ask you a favor to bring it into our source code? We could work
> >> together to make it work for
> >> our current cases and then run some benchmark with real case.
> >>
> >>     Please feel free to open Kylin JIRA
> >> <https://issues.apache.org/jira/browse/KYLIN> and put necessary
> >> information there.
> >>     Thank you very much and looking forward for your contributions if
> you
> >> have time to help:)
> >>
> >>      Thanks.
> >>
> >> Luke
> >>
> >>
> >>
> >> Best Regards!
> >> ---------------------
> >>
> >> Luke Han
> >>
> >> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <qh...@apache.org> wrote:
> >>
> >>> Thanks Daniel for the recommendation.
> >>> Let's have a try :)
> >>>
> >>> Li Yang <li...@apache.org>于2015年9月18日周五 下午3:28写道:
> >>>
> >>> > Many thanks Daniel for the information!
> >>> >
> >>> > We will definitely give RoaringBitmap
> >>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark
> >>> and
> >>> > Druid.io, it can help Kylin's inverted index as well.  :-)
> >>> >
> >>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > Good day,
> >>> > >
> >>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our
> work,
> >>> we
> >>> > > found that Roaring bitmaps are often much better than ConciseSet
> >>> (e.g.,
> >>> > see
> >>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> >>> > > compression is often better and the speed difference can be
> >>> substantial.
> >>> > > This is even more so with version 0.5 of Roaring.
> >>> > >
> >>> > > We have a high quality Java implementation that is used by Apache
> >>> Spark
> >>> > > and Druid.io. The Druid people found that switching to Roaring
> bitmaps
> >>> > > could improve real-world performance by 30% or more.
> >>> > >
> >>> > > https://github.com/lemire/RoaringBitmap/
> >>> > >
> >>> > > When desired, the library supports memory file mapping, so that
> >>> > >  out-of-JVM heap memory is used instead. This can greatly improve
> IO
> >>> > > issues. The library is available under the Apache license and is
> >>> > > patent-free.
> >>> > >
> >>> > > We have an extensive real-data benchmark framework which you can
> run
> >>> for
> >>> > > yourself to compare Roaring with competitive alternatives such as
> >>> > > ConciseSet :
> >>> > >
> >>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
> >>> > >
> >>> > > Running such a benchmark can be as simple as launching a script.
> >>> > >
> >>> > > What we did for Druid is to make the bitmap format "pluggable" so
> you
> >>> can
> >>> > > switch from one format to the other using a configuration flag.
> This
> >>> is
> >>> > > implemented through simple wrappers, e.g., see
> >>> > >
> >>> > >
> >>> > >
> >>> >
> >>>
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> >>> > >
> >>> > > So it can be really easy to make it possible to switch the format
> >>> while
> >>> > > preserving backward compatibility if needed...
> >>> > >
> >>> > > If you are interested in trying out Roaring, please let us know...
> We
> >>> do
> >>> > > not think it is difficult work to integrate Roaring in Kylin
> (maybe a
> >>> day
> >>> > > or so of programming) and it could potentially improve performance
> >>> while
> >>> > > reducing memory storage.
> >>> > >
> >>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though
> they
> >>> > have
> >>> > > their own implementation, see
> >>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
> >>> > >
> >>> > > --
> >>> > >  Daniel Lemire for the Roaring team
> >>> > > https://github.com/lemire/
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Julian Hyde <jh...@apache.org>.
Thanks for the suggestion, Daniel. I'm an admirer of your work.

Julian

On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <le...@gmail.com> wrote:
> Good day,
>
> I posted an issue on Kylin JIRA :
>
> https://issues.apache.org/jira/browse/KYLIN-1034
>
> It is not very hard to replace ConciseSet by a RoaringBitmap... there are
> API differences, but it all comes down to computing unions, differences...
> and so forth. There are no fundamental differences. I am willing to
> contribute code and help in other ways.
>
> There are architectural issues that might be interesting to discuss before
> any code gets written. For example, are your bitmaps stored on disk, or are
> they strictly in-memory? If they are stored on disk and immutable, then I
> suggest that it might be interesting to consider memory mapping. This
> avoids the expensive process of loading up the data. I see that you seem to
> bring back bitmaps from bytes...
>
> Memory-mapped bitmaps are slightly slower because they go through the
> abstraction of a ByteBuffer, but saving on the IO and data manipulation is
> a big deal too.
>
>
>
>
> On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <lu...@gmail.com> wrote:
>
>> Hi Daniel,
>>     Thank you very much to let's know Roaring bitmaps, it looks promising
>> with better performance.
>>     We are really would like to have it in our project, as you said the
>> efforts should not too much,
>> May I ask you a favor to bring it into our source code? We could work
>> together to make it work for
>> our current cases and then run some benchmark with real case.
>>
>>     Please feel free to open Kylin JIRA
>> <https://issues.apache.org/jira/browse/KYLIN> and put necessary
>> information there.
>>     Thank you very much and looking forward for your contributions if you
>> have time to help:)
>>
>>      Thanks.
>>
>> Luke
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <qh...@apache.org> wrote:
>>
>>> Thanks Daniel for the recommendation.
>>> Let's have a try :)
>>>
>>> Li Yang <li...@apache.org>于2015年9月18日周五 下午3:28写道:
>>>
>>> > Many thanks Daniel for the information!
>>> >
>>> > We will definitely give RoaringBitmap
>>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark
>>> and
>>> > Druid.io, it can help Kylin's inverted index as well.  :-)
>>> >
>>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>> >
>>> > > Good day,
>>> > >
>>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work,
>>> we
>>> > > found that Roaring bitmaps are often much better than ConciseSet
>>> (e.g.,
>>> > see
>>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
>>> > > compression is often better and the speed difference can be
>>> substantial.
>>> > > This is even more so with version 0.5 of Roaring.
>>> > >
>>> > > We have a high quality Java implementation that is used by Apache
>>> Spark
>>> > > and Druid.io. The Druid people found that switching to Roaring bitmaps
>>> > > could improve real-world performance by 30% or more.
>>> > >
>>> > > https://github.com/lemire/RoaringBitmap/
>>> > >
>>> > > When desired, the library supports memory file mapping, so that
>>> > >  out-of-JVM heap memory is used instead. This can greatly improve IO
>>> > > issues. The library is available under the Apache license and is
>>> > > patent-free.
>>> > >
>>> > > We have an extensive real-data benchmark framework which you can run
>>> for
>>> > > yourself to compare Roaring with competitive alternatives such as
>>> > > ConciseSet :
>>> > >
>>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
>>> > >
>>> > > Running such a benchmark can be as simple as launching a script.
>>> > >
>>> > > What we did for Druid is to make the bitmap format "pluggable" so you
>>> can
>>> > > switch from one format to the other using a configuration flag. This
>>> is
>>> > > implemented through simple wrappers, e.g., see
>>> > >
>>> > >
>>> > >
>>> >
>>> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
>>> > >
>>> > > So it can be really easy to make it possible to switch the format
>>> while
>>> > > preserving backward compatibility if needed...
>>> > >
>>> > > If you are interested in trying out Roaring, please let us know... We
>>> do
>>> > > not think it is difficult work to integrate Roaring in Kylin (maybe a
>>> day
>>> > > or so of programming) and it could potentially improve performance
>>> while
>>> > > reducing memory storage.
>>> > >
>>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they
>>> > have
>>> > > their own implementation, see
>>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
>>> > >
>>> > > --
>>> > >  Daniel Lemire for the Roaring team
>>> > > https://github.com/lemire/
>>> > >
>>> > >
>>> >
>>>
>>
>>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
Good day,

I posted an issue on Kylin JIRA :

https://issues.apache.org/jira/browse/KYLIN-1034

It is not very hard to replace ConciseSet by a RoaringBitmap... there are
API differences, but it all comes down to computing unions, differences...
and so forth. There are no fundamental differences. I am willing to
contribute code and help in other ways.

There are architectural issues that might be interesting to discuss before
any code gets written. For example, are your bitmaps stored on disk, or are
they strictly in-memory? If they are stored on disk and immutable, then I
suggest that it might be interesting to consider memory mapping. This
avoids the expensive process of loading up the data. I see that you seem to
bring back bitmaps from bytes...

Memory-mapped bitmaps are slightly slower because they go through the
abstraction of a ByteBuffer, but saving on the IO and data manipulation is
a big deal too.




On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <lu...@gmail.com> wrote:

> Hi Daniel,
>     Thank you very much to let's know Roaring bitmaps, it looks promising
> with better performance.
>     We are really would like to have it in our project, as you said the
> efforts should not too much,
> May I ask you a favor to bring it into our source code? We could work
> together to make it work for
> our current cases and then run some benchmark with real case.
>
>     Please feel free to open Kylin JIRA
> <https://issues.apache.org/jira/browse/KYLIN> and put necessary
> information there.
>     Thank you very much and looking forward for your contributions if you
> have time to help:)
>
>      Thanks.
>
> Luke
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <qh...@apache.org> wrote:
>
>> Thanks Daniel for the recommendation.
>> Let's have a try :)
>>
>> Li Yang <li...@apache.org>于2015年9月18日周五 下午3:28写道:
>>
>> > Many thanks Daniel for the information!
>> >
>> > We will definitely give RoaringBitmap
>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark
>> and
>> > Druid.io, it can help Kylin's inverted index as well.  :-)
>> >
>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com>
>> wrote:
>> >
>> > > Good day,
>> > >
>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work,
>> we
>> > > found that Roaring bitmaps are often much better than ConciseSet
>> (e.g.,
>> > see
>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
>> > > compression is often better and the speed difference can be
>> substantial.
>> > > This is even more so with version 0.5 of Roaring.
>> > >
>> > > We have a high quality Java implementation that is used by Apache
>> Spark
>> > > and Druid.io. The Druid people found that switching to Roaring bitmaps
>> > > could improve real-world performance by 30% or more.
>> > >
>> > > https://github.com/lemire/RoaringBitmap/
>> > >
>> > > When desired, the library supports memory file mapping, so that
>> > >  out-of-JVM heap memory is used instead. This can greatly improve IO
>> > > issues. The library is available under the Apache license and is
>> > > patent-free.
>> > >
>> > > We have an extensive real-data benchmark framework which you can run
>> for
>> > > yourself to compare Roaring with competitive alternatives such as
>> > > ConciseSet :
>> > >
>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
>> > >
>> > > Running such a benchmark can be as simple as launching a script.
>> > >
>> > > What we did for Druid is to make the bitmap format "pluggable" so you
>> can
>> > > switch from one format to the other using a configuration flag. This
>> is
>> > > implemented through simple wrappers, e.g., see
>> > >
>> > >
>> > >
>> >
>> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
>> > >
>> > > So it can be really easy to make it possible to switch the format
>> while
>> > > preserving backward compatibility if needed...
>> > >
>> > > If you are interested in trying out Roaring, please let us know... We
>> do
>> > > not think it is difficult work to integrate Roaring in Kylin (maybe a
>> day
>> > > or so of programming) and it could potentially improve performance
>> while
>> > > reducing memory storage.
>> > >
>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they
>> > have
>> > > their own implementation, see
>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
>> > >
>> > > --
>> > >  Daniel Lemire for the Roaring team
>> > > https://github.com/lemire/
>> > >
>> > >
>> >
>>
>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Li Yang <li...@apache.org>.
Will definitely call for help when it's needed. Many thanks again!

On Tue, Oct 6, 2015 at 9:09 PM, Daniel Lemire <le...@gmail.com> wrote:

> I will be happy to collaborate. There are other people from the Roaring
> team who are available
> to help too.
>
> On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <li...@apache.org> wrote:
>
>> Patch merged into 1.x branch.
>> https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8
>>
>> There shall be major refactoring around inverted-index and realtime OLAP
>> in Q4 or early 2016. Will embrace more from Roaring bitmaps.
>>
>> Many thanks Daniel!
>>
>>
>> On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:
>>
>>> Hi Daniel,
>>>     The patch looks very great, will ask Yang to help review and merge if
>>> there's no issue.
>>>
>>>     Thank you very much for your contribution.
>>>
>>>
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>>
>>> > Good day Luke,
>>> >
>>> > A patch has been added to the JIRA :
>>> >
>>> > https://issues.apache.org/jira/browse/KYLIN-1034
>>> >
>>> > I have also issued a PR on GitHub:
>>> >
>>> > https://github.com/apache/incubator-kylin/pull/12
>>> >
>>> > The patch is straight-forward, and simply replaces Concise by Roaring
>>> > throughout.
>>> >
>>> > The relevant unit tests appear to pass.
>>> >
>>> > Further review, testing and benchmarking is encouraged. The purpose of
>>> > this patch
>>> > is to get the process started.
>>> >
>>> > To keep things simple, I did not do *any* redesign. Still... here are
>>> my
>>> > thoughts...
>>> >
>>> > Design-wise : It does look to me like the bitmaps are serialized to
>>> > streams of bytes.
>>> > From there, *immutable* bitmaps are reloaded on demand, then possibly
>>> > copied and modified.
>>> > The Roaring library has a class ideally suited for this purpose, called
>>> > ImmutableRoaringBitmap...
>>> > From any ByteBuffer, you can map directly a bitmap :
>>> >
>>> >
>>> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
>>> > Compared to deserializing a bitmap from a stream of bytes, this
>>> approach
>>> > avoids copying
>>> > and parsing the data: constructing an ImmutableRoaringBitmap is very
>>> fast
>>> > and uses very
>>> > little memory. Because they are formally immutable, you only need one
>>> > instance in your entire
>>> > application, irrespective of the number of cores. The data is accessed
>>> > only when the
>>> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
>>> > original stream of
>>> > bytes (no unnecessary copy is made). So it uses less memory.
>>> >
>>> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would
>>> not
>>> > be difficult,
>>> > programming-wise, but this would make the patch more difficult to
>>> review.
>>> >
>>> > (I'll recopy some of my comments on JIRA.)
>>> >
>>> >
>>> > As usual, the copyright of this patch and be assigned to whoever...
>>> should
>>> > you choose
>>> > to use it. This patch or the Roaring library itself are *not* covered
>>> by
>>> > patents. And
>>> > so forth.
>>> >
>>> >
>>> >
>>> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>> >
>>> >> Thanks for clarifying.
>>> >>
>>> >> Let me see what we can do on this front.
>>> >>
>>> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>>> >>
>>> >>> Thanks Daniel, I think that's most efficient way to have Roaring
>>> >>> work in existing code, patch is really be appreciated :)
>>> >>>
>>> >>> It's great discussion in KYLIN-1034.
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>>
>>> >>> Best Regards!
>>> >>> ---------------------
>>> >>>
>>> >>> Luke Han
>>> >>>
>>> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>> >>>
>>> >>>> Good day Luke,
>>> >>>>
>>> >>>> May I ask you a favor to bring it into our source code? We could
>>> work
>>> >>>>> together to make it work for
>>> >>>>> our current cases and then run some benchmark with real case.
>>> >>>>>
>>> >>>>
>>> >>>> We can rather easily substitute Roaring for Concise in the source
>>> code.
>>> >>>> Then submit a patch.
>>> >>>> Is that what would move this along most efficiently?
>>> >>>>
>>> >>>>
>>> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>> >>>>
>>> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>> >>>>
>>> >>>> Some of these issues (e.g., memory-file mapping) might be of general
>>> >>>> interest.
>>> >>>>
>>> >>>> - Daniel
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
I will be happy to collaborate. There are other people from the Roaring
team who are available
to help too.

On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <li...@apache.org> wrote:

> Patch merged into 1.x branch.
> https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8
>
> There shall be major refactoring around inverted-index and realtime OLAP
> in Q4 or early 2016. Will embrace more from Roaring bitmaps.
>
> Many thanks Daniel!
>
>
> On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:
>
>> Hi Daniel,
>>     The patch looks very great, will ask Yang to help review and merge if
>> there's no issue.
>>
>>     Thank you very much for your contribution.
>>
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com> wrote:
>>
>> > Good day Luke,
>> >
>> > A patch has been added to the JIRA :
>> >
>> > https://issues.apache.org/jira/browse/KYLIN-1034
>> >
>> > I have also issued a PR on GitHub:
>> >
>> > https://github.com/apache/incubator-kylin/pull/12
>> >
>> > The patch is straight-forward, and simply replaces Concise by Roaring
>> > throughout.
>> >
>> > The relevant unit tests appear to pass.
>> >
>> > Further review, testing and benchmarking is encouraged. The purpose of
>> > this patch
>> > is to get the process started.
>> >
>> > To keep things simple, I did not do *any* redesign. Still... here are my
>> > thoughts...
>> >
>> > Design-wise : It does look to me like the bitmaps are serialized to
>> > streams of bytes.
>> > From there, *immutable* bitmaps are reloaded on demand, then possibly
>> > copied and modified.
>> > The Roaring library has a class ideally suited for this purpose, called
>> > ImmutableRoaringBitmap...
>> > From any ByteBuffer, you can map directly a bitmap :
>> >
>> >
>> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
>> > Compared to deserializing a bitmap from a stream of bytes, this approach
>> > avoids copying
>> > and parsing the data: constructing an ImmutableRoaringBitmap is very
>> fast
>> > and uses very
>> > little memory. Because they are formally immutable, you only need one
>> > instance in your entire
>> > application, irrespective of the number of cores. The data is accessed
>> > only when the
>> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
>> > original stream of
>> > bytes (no unnecessary copy is made). So it uses less memory.
>> >
>> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would
>> not
>> > be difficult,
>> > programming-wise, but this would make the patch more difficult to
>> review.
>> >
>> > (I'll recopy some of my comments on JIRA.)
>> >
>> >
>> > As usual, the copyright of this patch and be assigned to whoever...
>> should
>> > you choose
>> > to use it. This patch or the Roaring library itself are *not* covered by
>> > patents. And
>> > so forth.
>> >
>> >
>> >
>> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com>
>> wrote:
>> >
>> >> Thanks for clarifying.
>> >>
>> >> Let me see what we can do on this front.
>> >>
>> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>> >>
>> >>> Thanks Daniel, I think that's most efficient way to have Roaring
>> >>> work in existing code, patch is really be appreciated :)
>> >>>
>> >>> It's great discussion in KYLIN-1034.
>> >>>
>> >>> Thanks.
>> >>>
>> >>>
>> >>> Best Regards!
>> >>> ---------------------
>> >>>
>> >>> Luke Han
>> >>>
>> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
>> wrote:
>> >>>
>> >>>> Good day Luke,
>> >>>>
>> >>>> May I ask you a favor to bring it into our source code? We could work
>> >>>>> together to make it work for
>> >>>>> our current cases and then run some benchmark with real case.
>> >>>>>
>> >>>>
>> >>>> We can rather easily substitute Roaring for Concise in the source
>> code.
>> >>>> Then submit a patch.
>> >>>> Is that what would move this along most efficiently?
>> >>>>
>> >>>>
>> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>> >>>>
>> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
>> >>>>
>> >>>> Some of these issues (e.g., memory-file mapping) might be of general
>> >>>> interest.
>> >>>>
>> >>>> - Daniel
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Li Yang <li...@apache.org>.
Patch merged into 1.x branch.
https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8

There shall be major refactoring around inverted-index and realtime OLAP in
Q4 or early 2016. Will embrace more from Roaring bitmaps.

Many thanks Daniel!


On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:

> Hi Daniel,
>     The patch looks very great, will ask Yang to help review and merge if
> there's no issue.
>
>     Thank you very much for your contribution.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com> wrote:
>
> > Good day Luke,
> >
> > A patch has been added to the JIRA :
> >
> > https://issues.apache.org/jira/browse/KYLIN-1034
> >
> > I have also issued a PR on GitHub:
> >
> > https://github.com/apache/incubator-kylin/pull/12
> >
> > The patch is straight-forward, and simply replaces Concise by Roaring
> > throughout.
> >
> > The relevant unit tests appear to pass.
> >
> > Further review, testing and benchmarking is encouraged. The purpose of
> > this patch
> > is to get the process started.
> >
> > To keep things simple, I did not do *any* redesign. Still... here are my
> > thoughts...
> >
> > Design-wise : It does look to me like the bitmaps are serialized to
> > streams of bytes.
> > From there, *immutable* bitmaps are reloaded on demand, then possibly
> > copied and modified.
> > The Roaring library has a class ideally suited for this purpose, called
> > ImmutableRoaringBitmap...
> > From any ByteBuffer, you can map directly a bitmap :
> >
> >
> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
> > Compared to deserializing a bitmap from a stream of bytes, this approach
> > avoids copying
> > and parsing the data: constructing an ImmutableRoaringBitmap is very fast
> > and uses very
> > little memory. Because they are formally immutable, you only need one
> > instance in your entire
> > application, irrespective of the number of cores. The data is accessed
> > only when the
> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
> > original stream of
> > bytes (no unnecessary copy is made). So it uses less memory.
> >
> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
> > be difficult,
> > programming-wise, but this would make the patch more difficult to review.
> >
> > (I'll recopy some of my comments on JIRA.)
> >
> >
> > As usual, the copyright of this patch and be assigned to whoever...
> should
> > you choose
> > to use it. This patch or the Roaring library itself are *not* covered by
> > patents. And
> > so forth.
> >
> >
> >
> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com> wrote:
> >
> >> Thanks for clarifying.
> >>
> >> Let me see what we can do on this front.
> >>
> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
> >>
> >>> Thanks Daniel, I think that's most efficient way to have Roaring
> >>> work in existing code, patch is really be appreciated :)
> >>>
> >>> It's great discussion in KYLIN-1034.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Best Regards!
> >>> ---------------------
> >>>
> >>> Luke Han
> >>>
> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
> wrote:
> >>>
> >>>> Good day Luke,
> >>>>
> >>>> May I ask you a favor to bring it into our source code? We could work
> >>>>> together to make it work for
> >>>>> our current cases and then run some benchmark with real case.
> >>>>>
> >>>>
> >>>> We can rather easily substitute Roaring for Concise in the source
> code.
> >>>> Then submit a patch.
> >>>> Is that what would move this along most efficiently?
> >>>>
> >>>>
> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
> >>>>
> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
> >>>>
> >>>> Some of these issues (e.g., memory-file mapping) might be of general
> >>>> interest.
> >>>>
> >>>> - Daniel
> >>>>
> >>>
> >>>
> >>
> >
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Luke Han <lu...@gmail.com>.
Hi Daniel,
    The patch looks very great, will ask Yang to help review and merge if
there's no issue.

    Thank you very much for your contribution.




Best Regards!
---------------------

Luke Han

On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com> wrote:

> Good day Luke,
>
> A patch has been added to the JIRA :
>
> https://issues.apache.org/jira/browse/KYLIN-1034
>
> I have also issued a PR on GitHub:
>
> https://github.com/apache/incubator-kylin/pull/12
>
> The patch is straight-forward, and simply replaces Concise by Roaring
> throughout.
>
> The relevant unit tests appear to pass.
>
> Further review, testing and benchmarking is encouraged. The purpose of
> this patch
> is to get the process started.
>
> To keep things simple, I did not do *any* redesign. Still... here are my
> thoughts...
>
> Design-wise : It does look to me like the bitmaps are serialized to
> streams of bytes.
> From there, *immutable* bitmaps are reloaded on demand, then possibly
> copied and modified.
> The Roaring library has a class ideally suited for this purpose, called
> ImmutableRoaringBitmap...
> From any ByteBuffer, you can map directly a bitmap :
>
> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
> Compared to deserializing a bitmap from a stream of bytes, this approach
> avoids copying
> and parsing the data: constructing an ImmutableRoaringBitmap is very fast
> and uses very
> little memory. Because they are formally immutable, you only need one
> instance in your entire
> application, irrespective of the number of cores. The data is accessed
> only when the
> ImmutableRoaringBitmap is actually queried, and what is accessed is the
> original stream of
> bytes (no unnecessary copy is made). So it uses less memory.
>
> Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
> be difficult,
> programming-wise, but this would make the patch more difficult to review.
>
> (I'll recopy some of my comments on JIRA.)
>
>
> As usual, the copyright of this patch and be assigned to whoever... should
> you choose
> to use it. This patch or the Roaring library itself are *not* covered by
> patents. And
> so forth.
>
>
>
> On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com> wrote:
>
>> Thanks for clarifying.
>>
>> Let me see what we can do on this front.
>>
>> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>>
>>> Thanks Daniel, I think that's most efficient way to have Roaring
>>> work in existing code, patch is really be appreciated :)
>>>
>>> It's great discussion in KYLIN-1034.
>>>
>>> Thanks.
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com> wrote:
>>>
>>>> Good day Luke,
>>>>
>>>> May I ask you a favor to bring it into our source code? We could work
>>>>> together to make it work for
>>>>> our current cases and then run some benchmark with real case.
>>>>>
>>>>
>>>> We can rather easily substitute Roaring for Concise in the source code.
>>>> Then submit a patch.
>>>> Is that what would move this along most efficiently?
>>>>
>>>>
>>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>>>
>>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>>>
>>>> Some of these issues (e.g., memory-file mapping) might be of general
>>>> interest.
>>>>
>>>> - Daniel
>>>>
>>>
>>>
>>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
Good day Luke,

A patch has been added to the JIRA :

https://issues.apache.org/jira/browse/KYLIN-1034

I have also issued a PR on GitHub:

https://github.com/apache/incubator-kylin/pull/12

The patch is straight-forward, and simply replaces Concise by Roaring
throughout.

The relevant unit tests appear to pass.

Further review, testing and benchmarking is encouraged. The purpose of this
patch
is to get the process started.

To keep things simple, I did not do *any* redesign. Still... here are my
thoughts...

Design-wise : It does look to me like the bitmaps are serialized to streams
of bytes.
>From there, *immutable* bitmaps are reloaded on demand, then possibly
copied and modified.
The Roaring library has a class ideally suited for this purpose, called
ImmutableRoaringBitmap...
>From any ByteBuffer, you can map directly a bitmap :
https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
Compared to deserializing a bitmap from a stream of bytes, this approach
avoids copying
and parsing the data: constructing an ImmutableRoaringBitmap is very fast
and uses very
little memory. Because they are formally immutable, you only need one
instance in your entire
application, irrespective of the number of cores. The data is accessed only
when the
ImmutableRoaringBitmap is actually queried, and what is accessed is the
original stream of
bytes (no unnecessary copy is made). So it uses less memory.

Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
be difficult,
programming-wise, but this would make the patch more difficult to review.

(I'll recopy some of my comments on JIRA.)


As usual, the copyright of this patch and be assigned to whoever... should
you choose
to use it. This patch or the Roaring library itself are *not* covered by
patents. And
so forth.



On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com> wrote:

> Thanks for clarifying.
>
> Let me see what we can do on this front.
>
> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>
>> Thanks Daniel, I think that's most efficient way to have Roaring
>> work in existing code, patch is really be appreciated :)
>>
>> It's great discussion in KYLIN-1034.
>>
>> Thanks.
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com> wrote:
>>
>>> Good day Luke,
>>>
>>> May I ask you a favor to bring it into our source code? We could work
>>>> together to make it work for
>>>> our current cases and then run some benchmark with real case.
>>>>
>>>
>>> We can rather easily substitute Roaring for Concise in the source code.
>>> Then submit a patch.
>>> Is that what would move this along most efficiently?
>>>
>>>
>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>>
>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>>
>>> Some of these issues (e.g., memory-file mapping) might be of general
>>> interest.
>>>
>>> - Daniel
>>>
>>
>>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
Thanks for clarifying.

Let me see what we can do on this front.

On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:

> Thanks Daniel, I think that's most efficient way to have Roaring
> work in existing code, patch is really be appreciated :)
>
> It's great discussion in KYLIN-1034.
>
> Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com> wrote:
>
>> Good day Luke,
>>
>> May I ask you a favor to bring it into our source code? We could work
>>> together to make it work for
>>> our current cases and then run some benchmark with real case.
>>>
>>
>> We can rather easily substitute Roaring for Concise in the source code.
>> Then submit a patch.
>> Is that what would move this along most efficiently?
>>
>>
>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>
>> https://issues.apache.org/jira/browse/KYLIN-1034
>>
>> Some of these issues (e.g., memory-file mapping) might be of general
>> interest.
>>
>> - Daniel
>>
>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Luke Han <lu...@gmail.com>.
Thanks Daniel, I think that's most efficient way to have Roaring
work in existing code, patch is really be appreciated :)

It's great discussion in KYLIN-1034.

Thanks.


Best Regards!
---------------------

Luke Han

On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com> wrote:

> Good day Luke,
>
> May I ask you a favor to bring it into our source code? We could work
>> together to make it work for
>> our current cases and then run some benchmark with real case.
>>
>
> We can rather easily substitute Roaring for Concise in the source code.
> Then submit a patch.
> Is that what would move this along most efficiently?
>
>
> Meanwhile, we have been flushing out some of the issues on JIRA :
>
> https://issues.apache.org/jira/browse/KYLIN-1034
>
> Some of these issues (e.g., memory-file mapping) might be of general
> interest.
>
> - Daniel
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
Good day Luke,

May I ask you a favor to bring it into our source code? We could work
> together to make it work for
> our current cases and then run some benchmark with real case.
>

We can rather easily substitute Roaring for Concise in the source code.
Then submit a patch.
Is that what would move this along most efficiently?


Meanwhile, we have been flushing out some of the issues on JIRA :

https://issues.apache.org/jira/browse/KYLIN-1034

Some of these issues (e.g., memory-file mapping) might be of general
interest.

- Daniel

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Luke Han <lu...@gmail.com>.
Hi Daniel,
    Thank you very much to let's know Roaring bitmaps, it looks promising
with better performance.
    We are really would like to have it in our project, as you said the
efforts should not too much,
May I ask you a favor to bring it into our source code? We could work
together to make it work for
our current cases and then run some benchmark with real case.

    Please feel free to open Kylin JIRA
<https://issues.apache.org/jira/browse/KYLIN> and put necessary information
there.
    Thank you very much and looking forward for your contributions if you
have time to help:)

     Thanks.

Luke



Best Regards!
---------------------

Luke Han

On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <qh...@apache.org> wrote:

> Thanks Daniel for the recommendation.
> Let's have a try :)
>
> Li Yang <li...@apache.org>于2015年9月18日周五 下午3:28写道:
>
> > Many thanks Daniel for the information!
> >
> > We will definitely give RoaringBitmap
> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and
> > Druid.io, it can help Kylin's inverted index as well.  :-)
> >
> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com>
> wrote:
> >
> > > Good day,
> > >
> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work,
> we
> > > found that Roaring bitmaps are often much better than ConciseSet (e.g.,
> > see
> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> > > compression is often better and the speed difference can be
> substantial.
> > > This is even more so with version 0.5 of Roaring.
> > >
> > > We have a high quality Java implementation that is used by Apache Spark
> > > and Druid.io. The Druid people found that switching to Roaring bitmaps
> > > could improve real-world performance by 30% or more.
> > >
> > > https://github.com/lemire/RoaringBitmap/
> > >
> > > When desired, the library supports memory file mapping, so that
> > >  out-of-JVM heap memory is used instead. This can greatly improve IO
> > > issues. The library is available under the Apache license and is
> > > patent-free.
> > >
> > > We have an extensive real-data benchmark framework which you can run
> for
> > > yourself to compare Roaring with competitive alternatives such as
> > > ConciseSet :
> > >
> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
> > >
> > > Running such a benchmark can be as simple as launching a script.
> > >
> > > What we did for Druid is to make the bitmap format "pluggable" so you
> can
> > > switch from one format to the other using a configuration flag. This is
> > > implemented through simple wrappers, e.g., see
> > >
> > >
> > >
> >
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> > >
> > > So it can be really easy to make it possible to switch the format while
> > > preserving backward compatibility if needed...
> > >
> > > If you are interested in trying out Roaring, please let us know... We
> do
> > > not think it is difficult work to integrate Roaring in Kylin (maybe a
> day
> > > or so of programming) and it could potentially improve performance
> while
> > > reducing memory storage.
> > >
> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they
> > have
> > > their own implementation, see
> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
> > >
> > > --
> > >  Daniel Lemire for the Roaring team
> > > https://github.com/lemire/
> > >
> > >
> >
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by 周千昊 <qh...@apache.org>.
Thanks Daniel for the recommendation.
Let's have a try :)

Li Yang <li...@apache.org>于2015年9月18日周五 下午3:28写道:

> Many thanks Daniel for the information!
>
> We will definitely give RoaringBitmap
> <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and
> Druid.io, it can help Kylin's inverted index as well.  :-)
>
> On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com> wrote:
>
> > Good day,
> >
> > I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we
> > found that Roaring bitmaps are often much better than ConciseSet (e.g.,
> see
> > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> > compression is often better and the speed difference can be substantial.
> > This is even more so with version 0.5 of Roaring.
> >
> > We have a high quality Java implementation that is used by Apache Spark
> > and Druid.io. The Druid people found that switching to Roaring bitmaps
> > could improve real-world performance by 30% or more.
> >
> > https://github.com/lemire/RoaringBitmap/
> >
> > When desired, the library supports memory file mapping, so that
> >  out-of-JVM heap memory is used instead. This can greatly improve IO
> > issues. The library is available under the Apache license and is
> > patent-free.
> >
> > We have an extensive real-data benchmark framework which you can run for
> > yourself to compare Roaring with competitive alternatives such as
> > ConciseSet :
> >
> > https://github.com/lemire/RoaringBitmap/tree/master/jmh
> >
> > Running such a benchmark can be as simple as launching a script.
> >
> > What we did for Druid is to make the bitmap format "pluggable" so you can
> > switch from one format to the other using a configuration flag. This is
> > implemented through simple wrappers, e.g., see
> >
> >
> >
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> >
> > So it can be really easy to make it possible to switch the format while
> > preserving backward compatibility if needed...
> >
> > If you are interested in trying out Roaring, please let us know... We do
> > not think it is difficult work to integrate Roaring in Kylin (maybe a day
> > or so of programming) and it could potentially improve performance while
> > reducing memory storage.
> >
> > Note: Roaring bitmaps were also adopted in Apache Lucene, though they
> have
> > their own implementation, see
> > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
> >
> > --
> >  Daniel Lemire for the Roaring team
> > https://github.com/lemire/
> >
> >
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Li Yang <li...@apache.org>.
Many thanks Daniel for the information!

We will definitely give RoaringBitmap
<https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and
Druid.io, it can help Kylin's inverted index as well.  :-)

On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <le...@gmail.com> wrote:

> Good day,
>
> I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we
> found that Roaring bitmaps are often much better than ConciseSet (e.g., see
> experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> compression is often better and the speed difference can be substantial.
> This is even more so with version 0.5 of Roaring.
>
> We have a high quality Java implementation that is used by Apache Spark
> and Druid.io. The Druid people found that switching to Roaring bitmaps
> could improve real-world performance by 30% or more.
>
> https://github.com/lemire/RoaringBitmap/
>
> When desired, the library supports memory file mapping, so that
>  out-of-JVM heap memory is used instead. This can greatly improve IO
> issues. The library is available under the Apache license and is
> patent-free.
>
> We have an extensive real-data benchmark framework which you can run for
> yourself to compare Roaring with competitive alternatives such as
> ConciseSet :
>
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
>
> Running such a benchmark can be as simple as launching a script.
>
> What we did for Druid is to make the bitmap format "pluggable" so you can
> switch from one format to the other using a configuration flag. This is
> implemented through simple wrappers, e.g., see
>
>
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
>
> So it can be really easy to make it possible to switch the format while
> preserving backward compatibility if needed...
>
> If you are interested in trying out Roaring, please let us know... We do
> not think it is difficult work to integrate Roaring in Kylin (maybe a day
> or so of programming) and it could potentially improve performance while
> reducing memory storage.
>
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
> their own implementation, see
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
>
> --
>  Daniel Lemire for the Roaring team
> https://github.com/lemire/
>
>