You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by renou oki <yo...@yahoo.fr> on 2008/08/07 10:44:43 UTC

Stop search process when a given number of hits is reached

Hello

Is there a way to stop the search process when a given number of hits is reached?

I have a counter feature which displays how many docs match a query. 
This counter is blocked; I mean that if there are more than 500 docs, it will just display "more than 500".
I don't care about the exact amount of docs matched by the query, the order of the hits or whatever...
What I want is to stop the search process when it reaches at least 500 hits in order to improve performance... 
(I want an average search time in about 50 - 100 ms)

I experimented with the following methods :
For the same query:
with search(Query query, Filter filter, Sort sort)  hits=157691 docs in searchingTime=2514 ms
with search(Query query, Filter filter, int n) (with n = 50)  TopDocs totalHits 157691 in searchingTime= 2360 ms
 
For another query:
With search(Query query, Filter filter, Sort sort) hits=1208 docs in searchingTime=750 ms
With search(Query query, Filter filter, int n) (with n = 50) TopDocs totalHits 1208 in searchingTime= 718 ms

For another query:
With search(Query query, Filter filter, Sort sort) hits=16174 cv(s) searchingTime=1297 ms
With search(Query query, Filter filter, int n) (with n = 50) TopDocs totalHits 16174 in searchingTime= 1219 ms

According to this results, replacing the first method by the other has no effect on either the search 
time or total number of hits returned

Also the lucene version used now is 1.9.1 (but i work on the evolution to 2.3.2)


Thanks a lot
(Sorry for my bad English ... you will easily guess, I’m French ;)



      _____________________________________________________________________________ 
Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr

Re: Stop search process when a given number of hits is reached

Posted by Andrzej Bialecki <ab...@getopt.org>.
Doron Cohen wrote:
> Nothing built in that I'm aware of will do this, but it can be done by
> searching with your own HitCollector.
> There is a related feature - stop search after a specified time - using
> TimeLimitedCollector.
> It is not released yet, see issue LUCENE-997.
> In short, the collector's collect() method is invoked in the search process
> for each matching document.
> Once 500 docs were collected, your collector can cause the search to stop by
> throwing an exception.
> Upon catching the exception you know that 500 docs were collected.

Two additional comments:

* the topN results from such incomplete search may be way off, if there 
were some high scoring documents somewhere beyond the limit.

* if you know that there are more important and less important documents 
in your corpus, and their relative weight is independent of the query 
(e.g. PageRank-type score), then you can restructure your index so that 
postings belonging to highly-scoring documents come first on the posting 
lists - this way you have a better chance to collect highly relevant 
documents first, even though the search is incomplete. You can find an 
implementation of this concept in Nutch 
(org.apache.nutch.indexer.IndexSorter).

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Stop search process when a given number of hits is reached

Posted by Doron Cohen <cd...@gmail.com>.
Nothing built in that I'm aware of will do this, but it can be done by
searching with your own HitCollector.
There is a related feature - stop search after a specified time - using
TimeLimitedCollector.
It is not released yet, see issue LUCENE-997.
In short, the collector's collect() method is invoked in the search process
for each matching document.
Once 500 docs were collected, your collector can cause the search to stop by
throwing an exception.
Upon catching the exception you know that 500 docs were collected.

Doron

On Thu, Aug 7, 2008 at 11:44 AM, renou oki <yo...@yahoo.fr> wrote:

> Hello
>
> Is there a way to stop the search process when a given number of hits is
> reached?
>
> I have a counter feature which displays how many docs match a query.
> This counter is blocked; I mean that if there are more than 500 docs, it
> will just display "more than 500".
> I don't care about the exact amount of docs matched by the query, the order
> of the hits or whatever...
> What I want is to stop the search process when it reaches at least 500 hits
> in order to improve performance...
> (I want an average search time in about 50 - 100 ms)
>
> I experimented with the following methods :
> For the same query:
> with search(Query query, Filter filter, Sort sort)  hits=157691 docs in
> searchingTime=2514 ms
> with search(Query query, Filter filter, int n) (with n = 50)  TopDocs
> totalHits 157691 in searchingTime= 2360 ms
>
> For another query:
> With search(Query query, Filter filter, Sort sort) hits=1208 docs in
> searchingTime=750 ms
> With search(Query query, Filter filter, int n) (with n = 50) TopDocs
> totalHits 1208 in searchingTime= 718 ms
>
> For another query:
> With search(Query query, Filter filter, Sort sort) hits=16174 cv(s)
> searchingTime=1297 ms
> With search(Query query, Filter filter, int n) (with n = 50) TopDocs
> totalHits 16174 in searchingTime= 1219 ms
>
> According to this results, replacing the first method by the other has no
> effect on either the search
> time or total number of hits returned
>
> Also the lucene version used now is 1.9.1 (but i work on the evolution to
> 2.3.2)
>
>
> Thanks a lot
> (Sorry for my bad English ... you will easily guess, I'm French ;)
>
>
>
>
>  _____________________________________________________________________________
> Envoyez avec Yahoo! Mail. Une boite mail plus intelligente
> http://mail.yahoo.fr
>