You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2009/02/27 21:33:26 UTC

Bitmap index

Hi,

I've had http://en.wikipedia.org/wiki/Bitmap_index open in my browser for weeks, thinking I'd bring it up here -- would a bitmap index make sense anywhere in Lucene (or perhaps Solr)?

Otis

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bitmap index

Posted by Michael McCandless <lu...@mikemccandless.com>.
Right, I think Lucene could decide under-the-hood what's the best data
structure when writing the column-stride field.  Sort of like how
BitVector has two ways (sparse vs unsparse) of storing itself on disk.

Mike

Otis Gospodnetic wrote:

>
> So that would require Lucene to dynamically/periodically check field  
> values and their frequencies and switch from a regular inverted  
> index to a bitmap index or just create an additional bitmap index  
> for those fields and their values?
>
> Otis
>
>
>
> ----- Original Message ----
>> From: Michael McCandless <lu...@mikemccandless.com>
>> To: java-dev@lucene.apache.org
>> Sent: Friday, February 27, 2009 4:41:32 PM
>> Subject: Re: Bitmap index
>>
>>
>> I think with column stride fields we should use Bitmap Index to  
>> represent fields
>> that have few values across many docs.
>>
>> Mike
>>
>> Uwe Schindler wrote:
>>
>>> In my opinion, we currently use some type of bitmap index with our  
>>> filters.
>>> OpenBitSet and SortedVIntList used in filters can be seen as  
>>> bitmap indexes
>>> specifying if a document is a hit of the filter or not. Maybe we  
>>> can use the
>>> compression technology mentioned in this Wikipedia article to  
>>> further
>>> optimize filters and their DocIdSetIterators.
>>>
>>> In my opinion, the real use of bitmap indexes is data warehousing,  
>>> when
>>> low-cardinality-columns are used. We are using Sybase IQ (a column  
>>> oriented
>>> database) that has heavy usage of bitmap indexes (a variation of  
>>> them are
>>> called LF - low-fast indexes there).
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>>>> Sent: Friday, February 27, 2009 9:33 PM
>>>> To: java-dev@lucene.apache.org
>>>> Subject: Bitmap index
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I've had http://en.wikipedia.org/wiki/Bitmap_index open in my  
>>>> browser for
>>>> weeks, thinking I'd bring it up here -- would a bitmap index make  
>>>> sense
>>>> anywhere in Lucene (or perhaps Solr)?
>>>>
>>>> Otis
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bitmap index

Posted by Otis Gospodnetic <ot...@yahoo.com>.
So that would require Lucene to dynamically/periodically check field values and their frequencies and switch from a regular inverted index to a bitmap index or just create an additional bitmap index for those fields and their values?

Otis



----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-dev@lucene.apache.org
> Sent: Friday, February 27, 2009 4:41:32 PM
> Subject: Re: Bitmap index
> 
> 
> I think with column stride fields we should use Bitmap Index to represent fields 
> that have few values across many docs.
> 
> Mike
> 
> Uwe Schindler wrote:
> 
> > In my opinion, we currently use some type of bitmap index with our filters.
> > OpenBitSet and SortedVIntList used in filters can be seen as bitmap indexes
> > specifying if a document is a hit of the filter or not. Maybe we can use the
> > compression technology mentioned in this Wikipedia article to further
> > optimize filters and their DocIdSetIterators.
> > 
> > In my opinion, the real use of bitmap indexes is data warehousing, when
> > low-cardinality-columns are used. We are using Sybase IQ (a column oriented
> > database) that has heavy usage of bitmap indexes (a variation of them are
> > called LF - low-fast indexes there).
> > 
> > Uwe
> > 
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> > 
> >> -----Original Message-----
> >> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> >> Sent: Friday, February 27, 2009 9:33 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Bitmap index
> >> 
> >> 
> >> Hi,
> >> 
> >> I've had http://en.wikipedia.org/wiki/Bitmap_index open in my browser for
> >> weeks, thinking I'd bring it up here -- would a bitmap index make sense
> >> anywhere in Lucene (or perhaps Solr)?
> >> 
> >> Otis
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bitmap index

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think with column stride fields we should use Bitmap Index to  
represent fields that have few values across many docs.

Mike

Uwe Schindler wrote:

> In my opinion, we currently use some type of bitmap index with our  
> filters.
> OpenBitSet and SortedVIntList used in filters can be seen as bitmap  
> indexes
> specifying if a document is a hit of the filter or not. Maybe we can  
> use the
> compression technology mentioned in this Wikipedia article to further
> optimize filters and their DocIdSetIterators.
>
> In my opinion, the real use of bitmap indexes is data warehousing,  
> when
> low-cardinality-columns are used. We are using Sybase IQ (a column  
> oriented
> database) that has heavy usage of bitmap indexes (a variation of  
> them are
> called LF - low-fast indexes there).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: Friday, February 27, 2009 9:33 PM
>> To: java-dev@lucene.apache.org
>> Subject: Bitmap index
>>
>>
>> Hi,
>>
>> I've had http://en.wikipedia.org/wiki/Bitmap_index open in my  
>> browser for
>> weeks, thinking I'd bring it up here -- would a bitmap index make  
>> sense
>> anywhere in Lucene (or perhaps Solr)?
>>
>> Otis
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bitmap index

Posted by Earwin Burrfoot <ea...@gmail.com>.
> Maybe we can use the
> compression technology mentioned in this Wikipedia article to further
> optimize filters and their DocIdSetIterators.
We already use WAH-encoded bitmap filters over here for roughly a
year. And yes, they are nice.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Bitmap index

Posted by Otis Gospodnetic <ot...@yahoo.com>.
OK, so that bit about filters, OpenBitSet and friends was my feeling/understanding, too.  That sort of matches what that Wikipedia page describes as in-memory usage of bitmaps a la PostgreSQL.  The reason I mentioned Solr is because I was thinking of low-cardinality fields, perhaps the same ones that people tend to use for faceting.

Otis



----- Original Message ----
> From: Uwe Schindler <uw...@thetaphi.de>
> To: java-dev@lucene.apache.org
> Sent: Friday, February 27, 2009 4:37:19 PM
> Subject: RE: Bitmap index
> 
> In my opinion, we currently use some type of bitmap index with our filters.
> OpenBitSet and SortedVIntList used in filters can be seen as bitmap indexes
> specifying if a document is a hit of the filter or not. Maybe we can use the
> compression technology mentioned in this Wikipedia article to further
> optimize filters and their DocIdSetIterators.
> 
> In my opinion, the real use of bitmap indexes is data warehousing, when
> low-cardinality-columns are used. We are using Sybase IQ (a column oriented
> database) that has heavy usage of bitmap indexes (a variation of them are
> called LF - low-fast indexes there).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, February 27, 2009 9:33 PM
> > To: java-dev@lucene.apache.org
> > Subject: Bitmap index
> > 
> > 
> > Hi,
> > 
> > I've had http://en.wikipedia.org/wiki/Bitmap_index open in my browser for
> > weeks, thinking I'd bring it up here -- would a bitmap index make sense
> > anywhere in Lucene (or perhaps Solr)?
> > 
> > Otis
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Bitmap index

Posted by Uwe Schindler <uw...@thetaphi.de>.
In my opinion, we currently use some type of bitmap index with our filters.
OpenBitSet and SortedVIntList used in filters can be seen as bitmap indexes
specifying if a document is a hit of the filter or not. Maybe we can use the
compression technology mentioned in this Wikipedia article to further
optimize filters and their DocIdSetIterators.

In my opinion, the real use of bitmap indexes is data warehousing, when
low-cardinality-columns are used. We are using Sybase IQ (a column oriented
database) that has heavy usage of bitmap indexes (a variation of them are
called LF - low-fast indexes there).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, February 27, 2009 9:33 PM
> To: java-dev@lucene.apache.org
> Subject: Bitmap index
> 
> 
> Hi,
> 
> I've had http://en.wikipedia.org/wiki/Bitmap_index open in my browser for
> weeks, thinking I'd bring it up here -- would a bitmap index make sense
> anywhere in Lucene (or perhaps Solr)?
> 
> Otis
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: index large size file

Posted by Michael McCandless <lu...@mikemccandless.com>.
Could you re-ask this on java-user, instead?  Thanks.

Mike

On Mar 9, 2009, at 6:15 PM, Amy Zhou wrote:

> Hi,
>
> I'm having a couple of questions about indexing large size file. As  
> my understanding, the default MaxFieldLength 100,000. In Lucene 2.4,  
> we can set the MaxFieldLength during constructor. My questions are:
>
> 1) How's the performance if MaxFieldLength is set to UNLIMITED?
> 2) Any other options for indexing large size file?
>
>
> Thx,
>
> Amy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


index large size file

Posted by Amy Zhou <am...@systemware.com>.
Hi,

I'm having a couple of questions about indexing large size file. As my understanding, the default MaxFieldLength 100,000. In Lucene 2.4, we can set the MaxFieldLength during constructor. My questions are:

1) How's the performance if MaxFieldLength is set to UNLIMITED?
2) Any other options for indexing large size file?


Thx,

Amy

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org