You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Smith (JIRA)" <ji...@apache.org> on 2009/09/18 17:19:16 UTC

[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

    [ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757199#action_12757199 ] 

Tim Smith commented on LUCENE-1821:
-----------------------------------

I've been playing with per-segment caches for the last couple of weeks and have got everything working pretty well

However, i have to end up doing a lot of mapping between an IndexReader instance, and the "index into the IndexReader[]" array of the IndexSearcher
this then allows me to easily get the proper document offset where needed, and/or get a handle on the proper per-segment cache/evaluation object/etc

For my use cases, it would be much easier if the following methods were available:

on Weight:
{code}
// readerId is the "i" in the for (int i = 0; i < readers.length; ++i) in IndexSearcher
// NOTE: that readerId is at the IndexSearcher level, not the MultiSearcher level
public Scorer scorer(IndexReader reader, int readerId, boolean inOrder, boolean topLevel);
{code}

on Collector:
{code}
public void setNextReader(IndexReader reader, int docBase, int readerId);
// NOTE: this isn't extremely needed, as its easier to get the readerId from docBase (using a cached int[] of docbases for the searcher)
{code}

I suppose i could use the fact that these methods will always be called in order, keeping and incrementing counter, however the javadoc explicitly says that these methods may be called out of "segment order" to be more efficient in the future. It would therefore be very useful if these indexes were passed into these methods.

To work around this, my searcher currently has a getReaderIdForReader() method very similar to my earlier proposed getIndexReaderBase() method




> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>             Fix For: 3.1
>
>         Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a Scorer to know the "actual" doc id for the document's it matches (only the relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all segments), there is now no way to index into them properly from inside a Scorer because the scorer is not passed the needed offset to calculate the "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader)iter.next();
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org