You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peter Karich <pe...@yahoo.de> on 2010/11/05 15:37:44 UTC

fast bitset

  Hi,

would this compressed and fast(?) bitset be interesting for solr/lucene 
or is openbitset already done this way?
quoting from github:

The goal of word-aligned compression is not to
achieve the best compression, but rather to
improve query processing time.

License is GPL version 3 and ASL2.0.

http://code.google.com/p/javaewah
https://github.com/lemire/javaewah

I just saw it on twitter ...

Regards,
Peter.

-- 
http://jetwick.com twitter search prototype


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: fast bitset

Posted by Earwin Burrfoot <ea...@gmail.com>.
It's okay, trunk has iteration-based filters.

Filters with low selectivity might be faster if used in oldstyle
random-access way, though. If one wants to exploit this, compressed
bitmaps are no go.

On Sat, Nov 6, 2010 at 00:29, Peter Karich <pe...@yahoo.de> wrote:
>
>
>>  And they're not random-access capable.
>
> which means it isn't applicable?
>
>
>> Important point about WAH and friends is their ability to be fast
>> and/or/not/xor'ed without full decompression. And they're not
>> random-access capable.
>>
>> On Fri, Nov 5, 2010 at 18:47, Uwe Schindler<uw...@thetaphi.de>  wrote:
>>>
>>> Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
>>> which is synchronized, in the iterator code - which is the thing that is
>>> most important for DocIdSets.... Looks like stone ages.
>>>
>>> Else I would simply give it a try by rewriting the class to also
>>> implement
>>> DocIdSet and return the optimized iterator (not the one in this class).
>>> You
>>> can then try to replace some OpenBitSets in any filters and perf test?
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>
>>>> -----Original Message-----
>>>> From: Peter Karich [mailto:peathal@yahoo.de]
>>>> Sent: Friday, November 05, 2010 3:38 PM
>>>> To: dev@lucene.apache.org
>>>> Subject: fast bitset
>>>>
>>>>   Hi,
>>>>
>>>> would this compressed and fast(?) bitset be interesting for solr/lucene
>>>> or
>>>
>>> is
>>>>
>>>> openbitset already done this way?
>>>> quoting from github:
>>>>
>>>> The goal of word-aligned compression is not to achieve the best
>>>
>>> compression,
>>>>
>>>> but rather to improve query processing time.
>>>>
>>>> License is GPL version 3 and ASL2.0.
>>>>
>>>> http://code.google.com/p/javaewah
>>>> https://github.com/lemire/javaewah
>>>>
>>>> I just saw it on twitter ...
>>>>
>>>> Regards,
>>>> Peter.
>>>>
>>>> --
>>>> http://jetwick.com twitter search prototype
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: fast bitset

Posted by Peter Karich <pe...@yahoo.de>.

>  And they're not random-access capable.

which means it isn't applicable?


> Important point about WAH and friends is their ability to be fast
> and/or/not/xor'ed without full decompression. And they're not
> random-access capable.
>
> On Fri, Nov 5, 2010 at 18:47, Uwe Schindler<uw...@thetaphi.de>  wrote:
>> Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
>> which is synchronized, in the iterator code - which is the thing that is
>> most important for DocIdSets.... Looks like stone ages.
>>
>> Else I would simply give it a try by rewriting the class to also implement
>> DocIdSet and return the optimized iterator (not the one in this class). You
>> can then try to replace some OpenBitSets in any filters and perf test?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>>> -----Original Message-----
>>> From: Peter Karich [mailto:peathal@yahoo.de]
>>> Sent: Friday, November 05, 2010 3:38 PM
>>> To: dev@lucene.apache.org
>>> Subject: fast bitset
>>>
>>>    Hi,
>>>
>>> would this compressed and fast(?) bitset be interesting for solr/lucene or
>> is
>>> openbitset already done this way?
>>> quoting from github:
>>>
>>> The goal of word-aligned compression is not to achieve the best
>> compression,
>>> but rather to improve query processing time.
>>>
>>> License is GPL version 3 and ASL2.0.
>>>
>>> http://code.google.com/p/javaewah
>>> https://github.com/lemire/javaewah
>>>
>>> I just saw it on twitter ...
>>>
>>> Regards,
>>> Peter.
>>>
>>> --
>>> http://jetwick.com twitter search prototype


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: fast bitset

Posted by Earwin Burrfoot <ea...@gmail.com>.
Important point about WAH and friends is their ability to be fast
and/or/not/xor'ed without full decompression. And they're not
random-access capable.

On Fri, Nov 5, 2010 at 18:47, Uwe Schindler <uw...@thetaphi.de> wrote:
> Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
> which is synchronized, in the iterator code - which is the thing that is
> most important for DocIdSets.... Looks like stone ages.
>
> Else I would simply give it a try by rewriting the class to also implement
> DocIdSet and return the optimized iterator (not the one in this class). You
> can then try to replace some OpenBitSets in any filters and perf test?
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Peter Karich [mailto:peathal@yahoo.de]
>> Sent: Friday, November 05, 2010 3:38 PM
>> To: dev@lucene.apache.org
>> Subject: fast bitset
>>
>>   Hi,
>>
>> would this compressed and fast(?) bitset be interesting for solr/lucene or
> is
>> openbitset already done this way?
>> quoting from github:
>>
>> The goal of word-aligned compression is not to achieve the best
> compression,
>> but rather to improve query processing time.
>>
>> License is GPL version 3 and ASL2.0.
>>
>> http://code.google.com/p/javaewah
>> https://github.com/lemire/javaewah
>>
>> I just saw it on twitter ...
>>
>> Regards,
>> Peter.
>>
>> --
>> http://jetwick.com twitter search prototype
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: fast bitset

Posted by Uwe Schindler <uw...@thetaphi.de>.
Looks interesting, I was only annoyed when I saw "new Vector<Integer>()",
which is synchronized, in the iterator code - which is the thing that is
most important for DocIdSets.... Looks like stone ages.

Else I would simply give it a try by rewriting the class to also implement
DocIdSet and return the optimized iterator (not the one in this class). You
can then try to replace some OpenBitSets in any filters and perf test?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Peter Karich [mailto:peathal@yahoo.de]
> Sent: Friday, November 05, 2010 3:38 PM
> To: dev@lucene.apache.org
> Subject: fast bitset
> 
>   Hi,
> 
> would this compressed and fast(?) bitset be interesting for solr/lucene or
is
> openbitset already done this way?
> quoting from github:
> 
> The goal of word-aligned compression is not to achieve the best
compression,
> but rather to improve query processing time.
> 
> License is GPL version 3 and ASL2.0.
> 
> http://code.google.com/p/javaewah
> https://github.com/lemire/javaewah
> 
> I just saw it on twitter ...
> 
> Regards,
> Peter.
> 
> --
> http://jetwick.com twitter search prototype
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org