You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by patrick o'leary <pj...@pjaol.com> on 2009/04/28 22:31:05 UTC

ReadOnlyMultiSegmentReader bitset id vs doc id

hey

I've got a filter that's storing document id's with a geo distance for
spatial lucene using a bitset position for doc id,
However with a MultiSegmentReader that's no longer going to working.

What's the most appropriate way to go from bitset position to doc id now?

Thanks
Patrick

Re: ReadOnlyMultiSegmentReader bitset id vs doc id

Posted by patrick o'leary <pj...@pjaol.com>.
Ok finally with some pointers from Ryan, figured out the last problem.
So as a note to anyone else who might encounter the same problems with
multireader

A) Directories can contain multiple segments and a reader for those segments
B) Searches are replayed within each reader in a serial fashion **
C) If utilizing FieldCache / BitSet or anything related to document position
within a reader, and you need docId
   -- document id = (sum of previous reader maxdocs )+ bitset position

e.g.
int offset;
int nextOffset;

public DocIdSet getDocIdSet(IndexReader reader) {

   OpenBitSet bitset = new OpenBitSet(reader.maxDoc());
   offset += reader.maxDoc();
   for (int i =0; i reader.maxDoc(); i++)  {
        .....
        .... filter stuff ....
        ....
        if ( good ) {
           bitset.set( i );

           int docId = i + nextOffset;
           ...........
        }
   }

  nextOffset += offset;
  .......
}


K, works time for sleep

P


On Tue, Apr 28, 2009 at 5:44 PM, patrick o'leary <pj...@pjaol.com> wrote:

> Think I may have found it, it was multiple runs of the filter, one for each
> segment reader, I was generating a new map to hold distances each time. So
> only the distances from the
> last segment reader were stored.
>
> Currently it looks like those segmented searches are done serially, well in
> solr they are-
> I presume the end goal is to make them multi-threaded ?
> I'll need to make my map synchronized
>
>
> On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>>  What is the problem exactly? Maybe you use the new Collector API, where
>> the search is done for each segment, so caching does not work correctly?
>>
>>
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>   ------------------------------
>>
>> *From:* patrick o'leary [mailto:pjaol@pjaol.com]
>> *Sent:* Tuesday, April 28, 2009 10:31 PM
>> *To:* java-dev@lucene.apache.org
>> *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id
>>
>>
>>
>> hey
>>
>> I've got a filter that's storing document id's with a geo distance for
>> spatial lucene using a bitset position for doc id,
>> However with a MultiSegmentReader that's no longer going to working.
>>
>> What's the most appropriate way to go from bitset position to doc id now?
>>
>> Thanks
>> Patrick
>>
>
>

Re: ReadOnlyMultiSegmentReader bitset id vs doc id

Posted by Mark Miller <ma...@gmail.com>.
I'm not sure that we could parallelize it. Currently, its a serial 
process (as you say) - the queue collects across readers by adjusting 
the values in the queue to sort correctly against the current reader. 
That approach doesn't appear easily parallelized.

patrick o'leary wrote:
> Think I may have found it, it was multiple runs of the filter, one for 
> each segment reader, I was generating a new map to hold distances each 
> time. So only the distances from the
> last segment reader were stored.
>
> Currently it looks like those segmented searches are done serially, 
> well in solr they are-
> I presume the end goal is to make them multi-threaded ?
> I'll need to make my map synchronized
>
>
> On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler <uwe@thetaphi.de 
> <ma...@thetaphi.de>> wrote:
>
>     What is the problem exactly? Maybe you use the new Collector API,
>     where the search is done for each segment, so caching does not
>     work correctly?
>
>      
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* patrick o'leary [mailto:pjaol@pjaol.com
>     <ma...@pjaol.com>]
>     *Sent:* Tuesday, April 28, 2009 10:31 PM
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id
>
>      
>
>     hey
>
>     I've got a filter that's storing document id's with a geo distance
>     for spatial lucene using a bitset position for doc id,
>     However with a MultiSegmentReader that's no longer going to working.
>
>     What's the most appropriate way to go from bitset position to doc
>     id now?
>
>     Thanks
>     Patrick
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: ReadOnlyMultiSegmentReader bitset id vs doc id

Posted by patrick o'leary <pj...@pjaol.com>.
Think I may have found it, it was multiple runs of the filter, one for each
segment reader, I was generating a new map to hold distances each time. So
only the distances from the
last segment reader were stored.

Currently it looks like those segmented searches are done serially, well in
solr they are-
I presume the end goal is to make them multi-threaded ?
I'll need to make my map synchronized


On Tue, Apr 28, 2009 at 4:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  What is the problem exactly? Maybe you use the new Collector API, where
> the search is done for each segment, so caching does not work correctly?
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* patrick o'leary [mailto:pjaol@pjaol.com]
> *Sent:* Tuesday, April 28, 2009 10:31 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id
>
>
>
> hey
>
> I've got a filter that's storing document id's with a geo distance for
> spatial lucene using a bitset position for doc id,
> However with a MultiSegmentReader that's no longer going to working.
>
> What's the most appropriate way to go from bitset position to doc id now?
>
> Thanks
> Patrick
>

Re: ReadOnlyMultiSegmentReader bitset id vs doc id

Posted by Mark Miller <ma...@gmail.com>.
You might check out this Solr exchange : 
http://www.lucidimagination.com/search/document/b2ccc68ca834129/lucene_2_9_migration_issues_multireader_vs_indexreader_document_ids

There are a few suggestions throughout.


-- 
- Mark

http://www.lucidimagination.com



Uwe Schindler wrote:
>
> What is the problem exactly? Maybe you use the new Collector API, 
> where the search is done for each segment, so caching does not work 
> correctly?
>
>  
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> ------------------------------------------------------------------------
>
> *From:* patrick o'leary [mailto:pjaol@pjaol.com]
> *Sent:* Tuesday, April 28, 2009 10:31 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* ReadOnlyMultiSegmentReader bitset id vs doc id
>
>  
>
> hey
>
> I've got a filter that's storing document id's with a geo distance for 
> spatial lucene using a bitset position for doc id,
> However with a MultiSegmentReader that's no longer going to working.
>
> What's the most appropriate way to go from bitset position to doc id now?
>
> Thanks
> Patrick
>





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: ReadOnlyMultiSegmentReader bitset id vs doc id

Posted by Uwe Schindler <uw...@thetaphi.de>.
What is the problem exactly? Maybe you use the new Collector API, where the
search is done for each segment, so caching does not work correctly?

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: patrick o'leary [mailto:pjaol@pjaol.com] 
Sent: Tuesday, April 28, 2009 10:31 PM
To: java-dev@lucene.apache.org
Subject: ReadOnlyMultiSegmentReader bitset id vs doc id

 

hey

I've got a filter that's storing document id's with a geo distance for
spatial lucene using a bitset position for doc id,
However with a MultiSegmentReader that's no longer going to working.

What's the most appropriate way to go from bitset position to doc id now?

Thanks
Patrick