You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Li Yang <li...@apache.org> on 2015/10/06 13:23:32 UTC

Re: Faster bitmap indexes with Roaring bitmaps

Patch merged into 1.x branch.
https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8

There shall be major refactoring around inverted-index and realtime OLAP in
Q4 or early 2016. Will embrace more from Roaring bitmaps.

Many thanks Daniel!


On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:

> Hi Daniel,
>     The patch looks very great, will ask Yang to help review and merge if
> there's no issue.
>
>     Thank you very much for your contribution.
>
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com> wrote:
>
> > Good day Luke,
> >
> > A patch has been added to the JIRA :
> >
> > https://issues.apache.org/jira/browse/KYLIN-1034
> >
> > I have also issued a PR on GitHub:
> >
> > https://github.com/apache/incubator-kylin/pull/12
> >
> > The patch is straight-forward, and simply replaces Concise by Roaring
> > throughout.
> >
> > The relevant unit tests appear to pass.
> >
> > Further review, testing and benchmarking is encouraged. The purpose of
> > this patch
> > is to get the process started.
> >
> > To keep things simple, I did not do *any* redesign. Still... here are my
> > thoughts...
> >
> > Design-wise : It does look to me like the bitmaps are serialized to
> > streams of bytes.
> > From there, *immutable* bitmaps are reloaded on demand, then possibly
> > copied and modified.
> > The Roaring library has a class ideally suited for this purpose, called
> > ImmutableRoaringBitmap...
> > From any ByteBuffer, you can map directly a bitmap :
> >
> >
> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
> > Compared to deserializing a bitmap from a stream of bytes, this approach
> > avoids copying
> > and parsing the data: constructing an ImmutableRoaringBitmap is very fast
> > and uses very
> > little memory. Because they are formally immutable, you only need one
> > instance in your entire
> > application, irrespective of the number of cores. The data is accessed
> > only when the
> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
> > original stream of
> > bytes (no unnecessary copy is made). So it uses less memory.
> >
> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
> > be difficult,
> > programming-wise, but this would make the patch more difficult to review.
> >
> > (I'll recopy some of my comments on JIRA.)
> >
> >
> > As usual, the copyright of this patch and be assigned to whoever...
> should
> > you choose
> > to use it. This patch or the Roaring library itself are *not* covered by
> > patents. And
> > so forth.
> >
> >
> >
> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com> wrote:
> >
> >> Thanks for clarifying.
> >>
> >> Let me see what we can do on this front.
> >>
> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
> >>
> >>> Thanks Daniel, I think that's most efficient way to have Roaring
> >>> work in existing code, patch is really be appreciated :)
> >>>
> >>> It's great discussion in KYLIN-1034.
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> Best Regards!
> >>> ---------------------
> >>>
> >>> Luke Han
> >>>
> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
> wrote:
> >>>
> >>>> Good day Luke,
> >>>>
> >>>> May I ask you a favor to bring it into our source code? We could work
> >>>>> together to make it work for
> >>>>> our current cases and then run some benchmark with real case.
> >>>>>
> >>>>
> >>>> We can rather easily substitute Roaring for Concise in the source
> code.
> >>>> Then submit a patch.
> >>>> Is that what would move this along most efficiently?
> >>>>
> >>>>
> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
> >>>>
> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
> >>>>
> >>>> Some of these issues (e.g., memory-file mapping) might be of general
> >>>> interest.
> >>>>
> >>>> - Daniel
> >>>>
> >>>
> >>>
> >>
> >
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Li Yang <li...@apache.org>.
Will definitely call for help when it's needed. Many thanks again!

On Tue, Oct 6, 2015 at 9:09 PM, Daniel Lemire <le...@gmail.com> wrote:

> I will be happy to collaborate. There are other people from the Roaring
> team who are available
> to help too.
>
> On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <li...@apache.org> wrote:
>
>> Patch merged into 1.x branch.
>> https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8
>>
>> There shall be major refactoring around inverted-index and realtime OLAP
>> in Q4 or early 2016. Will embrace more from Roaring bitmaps.
>>
>> Many thanks Daniel!
>>
>>
>> On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:
>>
>>> Hi Daniel,
>>>     The patch looks very great, will ask Yang to help review and merge if
>>> there's no issue.
>>>
>>>     Thank you very much for your contribution.
>>>
>>>
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>>
>>> > Good day Luke,
>>> >
>>> > A patch has been added to the JIRA :
>>> >
>>> > https://issues.apache.org/jira/browse/KYLIN-1034
>>> >
>>> > I have also issued a PR on GitHub:
>>> >
>>> > https://github.com/apache/incubator-kylin/pull/12
>>> >
>>> > The patch is straight-forward, and simply replaces Concise by Roaring
>>> > throughout.
>>> >
>>> > The relevant unit tests appear to pass.
>>> >
>>> > Further review, testing and benchmarking is encouraged. The purpose of
>>> > this patch
>>> > is to get the process started.
>>> >
>>> > To keep things simple, I did not do *any* redesign. Still... here are
>>> my
>>> > thoughts...
>>> >
>>> > Design-wise : It does look to me like the bitmaps are serialized to
>>> > streams of bytes.
>>> > From there, *immutable* bitmaps are reloaded on demand, then possibly
>>> > copied and modified.
>>> > The Roaring library has a class ideally suited for this purpose, called
>>> > ImmutableRoaringBitmap...
>>> > From any ByteBuffer, you can map directly a bitmap :
>>> >
>>> >
>>> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
>>> > Compared to deserializing a bitmap from a stream of bytes, this
>>> approach
>>> > avoids copying
>>> > and parsing the data: constructing an ImmutableRoaringBitmap is very
>>> fast
>>> > and uses very
>>> > little memory. Because they are formally immutable, you only need one
>>> > instance in your entire
>>> > application, irrespective of the number of cores. The data is accessed
>>> > only when the
>>> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
>>> > original stream of
>>> > bytes (no unnecessary copy is made). So it uses less memory.
>>> >
>>> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would
>>> not
>>> > be difficult,
>>> > programming-wise, but this would make the patch more difficult to
>>> review.
>>> >
>>> > (I'll recopy some of my comments on JIRA.)
>>> >
>>> >
>>> > As usual, the copyright of this patch and be assigned to whoever...
>>> should
>>> > you choose
>>> > to use it. This patch or the Roaring library itself are *not* covered
>>> by
>>> > patents. And
>>> > so forth.
>>> >
>>> >
>>> >
>>> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>> >
>>> >> Thanks for clarifying.
>>> >>
>>> >> Let me see what we can do on this front.
>>> >>
>>> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>>> >>
>>> >>> Thanks Daniel, I think that's most efficient way to have Roaring
>>> >>> work in existing code, patch is really be appreciated :)
>>> >>>
>>> >>> It's great discussion in KYLIN-1034.
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>>
>>> >>> Best Regards!
>>> >>> ---------------------
>>> >>>
>>> >>> Luke Han
>>> >>>
>>> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
>>> wrote:
>>> >>>
>>> >>>> Good day Luke,
>>> >>>>
>>> >>>> May I ask you a favor to bring it into our source code? We could
>>> work
>>> >>>>> together to make it work for
>>> >>>>> our current cases and then run some benchmark with real case.
>>> >>>>>
>>> >>>>
>>> >>>> We can rather easily substitute Roaring for Concise in the source
>>> code.
>>> >>>> Then submit a patch.
>>> >>>> Is that what would move this along most efficiently?
>>> >>>>
>>> >>>>
>>> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>> >>>>
>>> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>> >>>>
>>> >>>> Some of these issues (e.g., memory-file mapping) might be of general
>>> >>>> interest.
>>> >>>>
>>> >>>> - Daniel
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Faster bitmap indexes with Roaring bitmaps

Posted by Daniel Lemire <le...@gmail.com>.
I will be happy to collaborate. There are other people from the Roaring
team who are available
to help too.

On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <li...@apache.org> wrote:

> Patch merged into 1.x branch.
> https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8
>
> There shall be major refactoring around inverted-index and realtime OLAP
> in Q4 or early 2016. Will embrace more from Roaring bitmaps.
>
> Many thanks Daniel!
>
>
> On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <lu...@gmail.com> wrote:
>
>> Hi Daniel,
>>     The patch looks very great, will ask Yang to help review and merge if
>> there's no issue.
>>
>>     Thank you very much for your contribution.
>>
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <le...@gmail.com> wrote:
>>
>> > Good day Luke,
>> >
>> > A patch has been added to the JIRA :
>> >
>> > https://issues.apache.org/jira/browse/KYLIN-1034
>> >
>> > I have also issued a PR on GitHub:
>> >
>> > https://github.com/apache/incubator-kylin/pull/12
>> >
>> > The patch is straight-forward, and simply replaces Concise by Roaring
>> > throughout.
>> >
>> > The relevant unit tests appear to pass.
>> >
>> > Further review, testing and benchmarking is encouraged. The purpose of
>> > this patch
>> > is to get the process started.
>> >
>> > To keep things simple, I did not do *any* redesign. Still... here are my
>> > thoughts...
>> >
>> > Design-wise : It does look to me like the bitmaps are serialized to
>> > streams of bytes.
>> > From there, *immutable* bitmaps are reloaded on demand, then possibly
>> > copied and modified.
>> > The Roaring library has a class ideally suited for this purpose, called
>> > ImmutableRoaringBitmap...
>> > From any ByteBuffer, you can map directly a bitmap :
>> >
>> >
>> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
>> > Compared to deserializing a bitmap from a stream of bytes, this approach
>> > avoids copying
>> > and parsing the data: constructing an ImmutableRoaringBitmap is very
>> fast
>> > and uses very
>> > little memory. Because they are formally immutable, you only need one
>> > instance in your entire
>> > application, irrespective of the number of cores. The data is accessed
>> > only when the
>> > ImmutableRoaringBitmap is actually queried, and what is accessed is the
>> > original stream of
>> > bytes (no unnecessary copy is made). So it uses less memory.
>> >
>> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would
>> not
>> > be difficult,
>> > programming-wise, but this would make the patch more difficult to
>> review.
>> >
>> > (I'll recopy some of my comments on JIRA.)
>> >
>> >
>> > As usual, the copyright of this patch and be assigned to whoever...
>> should
>> > you choose
>> > to use it. This patch or the Roaring library itself are *not* covered by
>> > patents. And
>> > so forth.
>> >
>> >
>> >
>> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <le...@gmail.com>
>> wrote:
>> >
>> >> Thanks for clarifying.
>> >>
>> >> Let me see what we can do on this front.
>> >>
>> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <lu...@gmail.com> wrote:
>> >>
>> >>> Thanks Daniel, I think that's most efficient way to have Roaring
>> >>> work in existing code, patch is really be appreciated :)
>> >>>
>> >>> It's great discussion in KYLIN-1034.
>> >>>
>> >>> Thanks.
>> >>>
>> >>>
>> >>> Best Regards!
>> >>> ---------------------
>> >>>
>> >>> Luke Han
>> >>>
>> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <le...@gmail.com>
>> wrote:
>> >>>
>> >>>> Good day Luke,
>> >>>>
>> >>>> May I ask you a favor to bring it into our source code? We could work
>> >>>>> together to make it work for
>> >>>>> our current cases and then run some benchmark with real case.
>> >>>>>
>> >>>>
>> >>>> We can rather easily substitute Roaring for Concise in the source
>> code.
>> >>>> Then submit a patch.
>> >>>> Is that what would move this along most efficiently?
>> >>>>
>> >>>>
>> >>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>> >>>>
>> >>>> https://issues.apache.org/jira/browse/KYLIN-1034
>> >>>>
>> >>>> Some of these issues (e.g., memory-file mapping) might be of general
>> >>>> interest.
>> >>>>
>> >>>> - Daniel
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>
>