You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Aphinyanaphongs, Yindalon" <pi...@Vanderbilt.Edu> on 2004/07/20 07:27:56 UTC
Post-sorted inverted index?
I gather from reading the documentation that the scores for each document hit are computed at query time. I have an application that, due to the complexity of the function, cannot compute scores at query time. Would it be possible for me to store the documents in pre-sorted order in the inverted index? (i.e. after the initial index is created, to have a post processing step to sort and reindex the final documents).
For example:
Document A - score 0.2
Document B - score 0.4
Document C - score 0.6
Thus for the word 'the', the stored order in the index would be C,B,A.
Thanks!
Re: Post-sorted inverted index?
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 20, 2004, at 1:27 AM, Aphinyanaphongs, Yindalon wrote:
> I gather from reading the documentation that the scores for each
> document hit are computed at query time. I have an application that,
> due to the complexity of the function, cannot compute scores at query
> time. Would it be possible for me to store the documents in
> pre-sorted order in the inverted index? (i.e. after the initial index
> is created, to have a post processing step to sort and reindex the
> final documents).
>
> For example:
> Document A - score 0.2
> Document B - score 0.4
> Document C - score 0.6
>
> Thus for the word 'the', the stored order in the index would be C,B,A.
Lucene 1.4 includes a Sort facility - look at the additional
IndexSearcher.search() methods for details. By default, if the scores
computed are identical, the results are then ordered by document id,
which is the insertion order.
I hope this helps.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Post-sorted inverted index?
Posted by Doug Cutting <cu...@apache.org>.
You can define a subclass of FilterIndexReader that re-sorts documents
in TermPositions(Term) and document(int), then use
IndexWriter.addIndexes() to write this in Lucene's standard format. I
have done this in Nutch, with the (as yet unused) IndexOptimizer.
http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/indexer/IndexOptimizer.java?view=markup
Doug
Aphinyanaphongs, Yindalon wrote:
> I gather from reading the documentation that the scores for each document hit are computed at query time. I have an application that, due to the complexity of the function, cannot compute scores at query time. Would it be possible for me to store the documents in pre-sorted order in the inverted index? (i.e. after the initial index is created, to have a post processing step to sort and reindex the final documents).
>
> For example:
> Document A - score 0.2
> Document B - score 0.4
> Document C - score 0.6
>
> Thus for the word 'the', the stored order in the index would be C,B,A.
>
> Thanks!
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org