You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicola Buso <nb...@ebi.ac.uk> on 2013/09/18 17:21:09 UTC

TotalHitCountCollector performance

Hello,

I was going to use the TotalHitCountCollector in cases where I'm
interested just in the number of results.
Obviously I was hoping to gain in performances compared to a "scored"
query.
>From my tests it seam it's not so performant compare to the "scored"
search. At this point I'm wondering if I'm doing some errors.
I'm executing 1000 queries on an index with 167,424,681 entries:
TotalHitCountCollector average is 2803.34
TopFieldCollector average is 2981.47

Below is part of the code I'm using with TotalHitCountCollector, with
TopFieldCollector I'm not wrapping the query with ConstantScoreQuery

Does these numbers resemble right to you guys?



nb.



---------
// prepare the lucene query.
final org.apache.lucene.search.Query luceneQuery = new
ConstantScoreQuery(getQuery(queryDomain, query.getQueryString()));

try {
  final TotalHitCountCollector hitCountCollector = new
TotalHitCountCollector();

  long startTime = System.currentTimeMillis();
  searcher.search(luceneQuery, hitCountCollector);
  if (log.isTraceEnabled()) {
    log.trace(String.format("getNumberOfResults query=%s, domainId=%s
executed in: %s(ms)",
				query.getQueryString(), query.getDomainId(),
System.currentTimeMillis() - startTime));
  }
  return hitCountCollector.getTotalHits();
} catch (final IOException e) {
  throw new SearchFailedException("I/O [" + this.domain + "] " +
e.getMessage(), e);
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: TotalHitCountCollector performance

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi Uwe,

thanks for the fast reply. I removed the CSQ and checked again my test,
still there is mainly no difference with searching with
TopFieldCollector.

Just a clarification; the method getQuery(...) in my code is returning a
BooleanQuery, where the user input is expanded on some of the fields
present in the index.

Doing some profiling of the test I see the IndexSearcher end up calling
the BooleanScorer that take most of the time of the search execution.
Is this normal? can I configure the BooleanQuery to avoid the scorer to
be called like using disablecoords?



Nicola.


On Wed, 2013-09-18 at 18:15 +0200, Uwe Schindler wrote:
> Hi,
> 
> The ConstantScoreQuery part is just overhead. If scores are not requested, they should not be calculated - but CSQ cannot prevent this from happening at all. It just prevent's the collector from seeing the scores. As the counting collector does not request any scores, you just add a useless additional wrapper around the query's scorer.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> > Sent: Wednesday, September 18, 2013 5:21 PM
> > To: java-user
> > Subject: TotalHitCountCollector performance
> > 
> > Hello,
> > 
> > I was going to use the TotalHitCountCollector in cases where I'm interested
> > just in the number of results.
> > Obviously I was hoping to gain in performances compared to a "scored"
> > query.
> > From my tests it seam it's not so performant compare to the "scored"
> > search. At this point I'm wondering if I'm doing some errors.
> > I'm executing 1000 queries on an index with 167,424,681 entries:
> > TotalHitCountCollector average is 2803.34 TopFieldCollector average is
> > 2981.47
> > 
> > Below is part of the code I'm using with TotalHitCountCollector, with
> > TopFieldCollector I'm not wrapping the query with ConstantScoreQuery
> > 
> > Does these numbers resemble right to you guys?
> > 
> > 
> > 
> > nb.
> > 
> > 
> > 
> > ---------
> > // prepare the lucene query.
> > final org.apache.lucene.search.Query luceneQuery = new
> > ConstantScoreQuery(getQuery(queryDomain, query.getQueryString()));
> > 
> > try {
> >   final TotalHitCountCollector hitCountCollector = new
> > TotalHitCountCollector();
> > 
> >   long startTime = System.currentTimeMillis();
> >   searcher.search(luceneQuery, hitCountCollector);
> >   if (log.isTraceEnabled()) {
> >     log.trace(String.format("getNumberOfResults query=%s, domainId=%s
> > executed in: %s(ms)",
> > 				query.getQueryString(),
> > query.getDomainId(),
> > System.currentTimeMillis() - startTime));
> >   }
> >   return hitCountCollector.getTotalHits(); } catch (final IOException e) {
> >   throw new SearchFailedException("I/O [" + this.domain + "] " +
> > e.getMessage(), e); }
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: TotalHitCountCollector performance

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

The ConstantScoreQuery part is just overhead. If scores are not requested, they should not be calculated - but CSQ cannot prevent this from happening at all. It just prevent's the collector from seeing the scores. As the counting collector does not request any scores, you just add a useless additional wrapper around the query's scorer.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Nicola Buso [mailto:nbuso@ebi.ac.uk]
> Sent: Wednesday, September 18, 2013 5:21 PM
> To: java-user
> Subject: TotalHitCountCollector performance
> 
> Hello,
> 
> I was going to use the TotalHitCountCollector in cases where I'm interested
> just in the number of results.
> Obviously I was hoping to gain in performances compared to a "scored"
> query.
> From my tests it seam it's not so performant compare to the "scored"
> search. At this point I'm wondering if I'm doing some errors.
> I'm executing 1000 queries on an index with 167,424,681 entries:
> TotalHitCountCollector average is 2803.34 TopFieldCollector average is
> 2981.47
> 
> Below is part of the code I'm using with TotalHitCountCollector, with
> TopFieldCollector I'm not wrapping the query with ConstantScoreQuery
> 
> Does these numbers resemble right to you guys?
> 
> 
> 
> nb.
> 
> 
> 
> ---------
> // prepare the lucene query.
> final org.apache.lucene.search.Query luceneQuery = new
> ConstantScoreQuery(getQuery(queryDomain, query.getQueryString()));
> 
> try {
>   final TotalHitCountCollector hitCountCollector = new
> TotalHitCountCollector();
> 
>   long startTime = System.currentTimeMillis();
>   searcher.search(luceneQuery, hitCountCollector);
>   if (log.isTraceEnabled()) {
>     log.trace(String.format("getNumberOfResults query=%s, domainId=%s
> executed in: %s(ms)",
> 				query.getQueryString(),
> query.getDomainId(),
> System.currentTimeMillis() - startTime));
>   }
>   return hitCountCollector.getTotalHits(); } catch (final IOException e) {
>   throw new SearchFailedException("I/O [" + this.domain + "] " +
> e.getMessage(), e); }
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org