You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Patterson <jd...@gmail.com> on 2007/10/27 05:02:23 UTC
Sorted Index
Hi,
What's the best way to maintain an index that is sorted?
--
View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Sorted Index
Posted by Andrzej Bialecki <ab...@getopt.org>.
John Patterson wrote:
>
>
> Yonik Seeley wrote:
>> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
>> Most things in an inverted index are sorted (terms, matching document
>> ids, term positions within a field, etc). Can you be more specific
>> about what you are trying to accomplish?
>>
>
> Sorry, I mean sorting the documents in an order other than the order they
> are added. The my search could just return docs in index order. For the
> most common sorting I could collect only the first x docs and then
> short-circuit the search like we previously discussed.
These questions already have an answer in Nutch (see the
org.apache.nutch.indexer.IndexSorter, and
org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector).
>
> I was wondering if it is possible to apply a sort at merge time?
One method that I'm familiar with is the following: you can split the
result set into several large-ish bins, and apply arbitrary sorting
methods within each bin. Studies show that if you pick the right bin
size, users will rarely look into the second and the following bins, so
the task is reduced to the sorting of the first bin, e.g. 100 top
scoring docs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Sorted Index
Posted by John Patterson <jd...@gmail.com>.
Yonik Seeley wrote:
>
> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> Most things in an inverted index are sorted (terms, matching document
> ids, term positions within a field, etc). Can you be more specific
> about what you are trying to accomplish?
>
Sorry, I mean sorting the documents in an order other than the order they
are added. The my search could just return docs in index order. For the
most common sorting I could collect only the first x docs and then
short-circuit the search like we previously discussed.
I was wondering if it is possible to apply a sort at merge time?
Cheers,
John
--
View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13439134
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Sorted Index
Posted by Yonik Seeley <yo...@apache.org>.
On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> What's the best way to maintain an index that is sorted?
Most things in an inverted index are sorted (terms, matching document
ids, term positions within a field, etc). Can you be more specific
about what you are trying to accomplish?
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org