You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2008/09/12 06:55:01 UTC

Change to MultiReader

There was a message from Kirk Roberts, 18/4/2007 - MultiSearcher vs MultiReader

Grant mentioned the visibility of the readerIndex() method in MultiReader, but 
nothing seems ever came of it.

Is there any reason why the following could not be put into MultiReader? 
Something like this seems necessary when handling multiple indices to solve the 
BitSet caching issue I raised on the user thread.

It's slightly more efficient for a Filter implementation bits() method to know 
these reader numbers in the filter (as the doc id always seems to increment) 
rather than delegating back to the reader to resolve it each call.  However, it 
gives useful utility methods for doing so, and gives freedom to the underlying 
implementation in case that needs to change.

Antony

/** Fetches the IndexReader instance where the specified document exists
  *  @param  n the MultiReader document number
  *  @return the reader index
  */
public int readerIndex(int n) {    // find reader for doc n:
   return MultiSegmentReader.readerIndex(n, this.starts, this.subReaders.length);
}

/** Fetches the document number in the specified reader for the given document
     number.
  *  @param  i the reader index obtained from {@link #readerIndex(int)}
  *  @param  n the MultiReader document number
  *  @return the mapped document number
  */
public int id(int i, int n) {    // find true doc for doc n:
   return n - this.starts[i];
}

/** Fetches the document number in the specified reader for the given document
     number.
  *  @param  n the MultiReader document number
  *  @return the mapped document number
  */
public int id(int n) {    // find true doc for doc n:
   return n - this.starts[readerIndex(n)];
}



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Change to MultiReader

Posted by Antony Bowesman <ad...@teamware.com>.

> Is the readerIndex being public all that useful, though?  Would it make 
> sense to just add the last id method here?  What other uses for the 
> readerIndex() is there?

I have 1..n indexes open and each open index has one or more caches, e.g. 
Map<DocNumber, MyInfo> or chained caching filters.  When a search is made I am 
making something like

new IndexSearcher(MutiReader(new IndexReader[] {a,b,c})) or
new IndexSearcher(MutiReader(new IndexReader[] {c,d,e))

where a,b,c,d,e are the partitions to be searched.  Knowing only the real Id 
doesn't tell me in which cache (associated with a particular IndexReader) I can 
then get MyInfo from, so the reader instance is essential, in the same way 
MultiSearcher.subSearcher() is.

Antony







---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Change to MultiReader

Posted by Grant Ingersoll <gs...@apache.org>.


On Sep 12, 2008, at 12:55 AM, Antony Bowesman wrote:

> There was a message from Kirk Roberts, 18/4/2007 - MultiSearcher vs  
> MultiReader
>
> Grant mentioned the visibility of the readerIndex() method in  
> MultiReader, but nothing seems ever came of it.

Kirk never followed up, AFAICT.

>
>
> Is there any reason why the following could not be put into  
> MultiReader? Something like this seems necessary when handling  
> multiple indices to solve the BitSet caching issue I raised on the  
> user thread.
>
> It's slightly more efficient for a Filter implementation bits()  
> method to know these reader numbers in the filter (as the doc id  
> always seems to increment) rather than delegating back to the reader  
> to resolve it each call.  However, it gives useful utility methods  
> for doing so, and gives freedom to the underlying implementation in  
> case that needs to change.
>
> Antony
>
> /** Fetches the IndexReader instance where the specified document  
> exists
> *  @param  n the MultiReader document number
> *  @return the reader index
> */
> public int readerIndex(int n) {    // find reader for doc n:
>  return MultiSegmentReader.readerIndex(n, this.starts,  
> this.subReaders.length);
> }
>
> /** Fetches the document number in the specified reader for the  
> given document
>    number.
> *  @param  i the reader index obtained from {@link #readerIndex(int)}
> *  @param  n the MultiReader document number
> *  @return the mapped document number
> */
> public int id(int i, int n) {    // find true doc for doc n:
>  return n - this.starts[i];
> }
>
> /** Fetches the document number in the specified reader for the  
> given document
>    number.
> *  @param  n the MultiReader document number
> *  @return the mapped document number
> */
> public int id(int n) {    // find true doc for doc n:
>  return n - this.starts[readerIndex(n)];
> }
>
>

Is the readerIndex being public all that useful, though?  Would it  
make sense to just add the last id method here?  What other uses for  
the readerIndex() is there?

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org