You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Patterson <jd...@gmail.com> on 2007/10/27 05:02:23 UTC

Sorted Index

Hi,

What's the best way to maintain an index that is sorted?
-- 
View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorted Index

Posted by Andrzej Bialecki <ab...@getopt.org>.
John Patterson wrote:
> 
> 
> Yonik Seeley wrote:
>> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
>> Most things in an inverted index are sorted (terms, matching document
>> ids, term positions within a field, etc).  Can you be more specific
>> about what you are trying to accomplish?
>>
> 
> Sorry, I mean sorting the documents in an order other than the order they
> are added.  The my search could just return docs in index order.  For the
> most common sorting I could collect only the first x docs and then
> short-circuit the search like we previously discussed.

These questions already have an answer in Nutch (see the 
org.apache.nutch.indexer.IndexSorter, and 
org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector).

> 
> I was wondering if it is possible to apply a sort at merge time?

One method that I'm familiar with is the following: you can split the 
result set into several large-ish bins, and apply arbitrary sorting 
methods within each bin. Studies show that if you pick the right bin 
size, users will rarely look into the second and the following bins, so 
the task is reduced to the sorting of the first bin, e.g. 100 top 
scoring docs.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorted Index

Posted by John Patterson <jd...@gmail.com>.


Yonik Seeley wrote:
> 
> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> Most things in an inverted index are sorted (terms, matching document
> ids, term positions within a field, etc).  Can you be more specific
> about what you are trying to accomplish?
> 

Sorry, I mean sorting the documents in an order other than the order they
are added.  The my search could just return docs in index order.  For the
most common sorting I could collect only the first x docs and then
short-circuit the search like we previously discussed.

I was wondering if it is possible to apply a sort at merge time?

Cheers,

John
-- 
View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13439134
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Sorted Index

Posted by Yonik Seeley <yo...@apache.org>.
On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> What's the best way to maintain an index that is sorted?

Most things in an inverted index are sorted (terms, matching document
ids, term positions within a field, etc).  Can you be more specific
about what you are trying to accomplish?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org