You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Aphinyanaphongs, Yindalon" <pi...@Vanderbilt.Edu> on 2004/07/20 07:27:56 UTC

Post-sorted inverted index?

I gather from reading the documentation that the scores for each document hit are computed at query time.  I have an application that, due to the complexity of the function, cannot compute scores at query time.  Would it be possible for me to store the documents in pre-sorted order in the inverted index? (i.e. after the initial index is created, to have a post processing step to sort and reindex the final documents).
 
For example:
Document A - score 0.2
Document B - score 0.4
Document C - score 0.6
 
Thus for the word 'the', the stored order in the index would be C,B,A.
 
Thanks!

Re: Post-sorted inverted index?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jul 20, 2004, at 1:27 AM, Aphinyanaphongs, Yindalon wrote:
> I gather from reading the documentation that the scores for each 
> document hit are computed at query time.  I have an application that, 
> due to the complexity of the function, cannot compute scores at query 
> time.  Would it be possible for me to store the documents in 
> pre-sorted order in the inverted index? (i.e. after the initial index 
> is created, to have a post processing step to sort and reindex the 
> final documents).
>
> For example:
> Document A - score 0.2
> Document B - score 0.4
> Document C - score 0.6
>
> Thus for the word 'the', the stored order in the index would be C,B,A.

Lucene 1.4 includes a Sort facility - look at the additional 
IndexSearcher.search() methods for details.  By default, if the scores 
computed are identical, the results are then ordered by document id, 
which is the insertion order.

I hope this helps.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Post-sorted inverted index?

Posted by Doug Cutting <cu...@apache.org>.

You can define a subclass of FilterIndexReader that re-sorts documents 
in TermPositions(Term) and document(int), then use 
IndexWriter.addIndexes() to write this in Lucene's standard format.  I 
have done this in Nutch, with the (as yet unused) IndexOptimizer.

http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/indexer/IndexOptimizer.java?view=markup

Doug

Aphinyanaphongs, Yindalon wrote:
> I gather from reading the documentation that the scores for each document hit are computed at query time.  I have an application that, due to the complexity of the function, cannot compute scores at query time.  Would it be possible for me to store the documents in pre-sorted order in the inverted index? (i.e. after the initial index is created, to have a post processing step to sort and reindex the final documents).
>  
> For example:
> Document A - score 0.2
> Document B - score 0.4
> Document C - score 0.6
>  
> Thus for the word 'the', the stored order in the index would be C,B,A.
>  
> Thanks!
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org