You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Beard, Brian" <Br...@mybir.com> on 2008/03/05 21:08:27 UTC

Boolean Query search performance

I'm using lucene 2.2.0.

I'm in the process of re-writing some queries to build BooleanQueries
instead of using query parser.

Bypassing query parser provides almost an order of magnitude improvement
for very large queries, but then the search performance takes 20-30%
longer. I'm adding boost values as a way to sort (long story, no
workaround I can see for now).

If I do a query.toString(), both queries give different results, which
is probably a clue (additional paren's with the BooleanQuery)

Query.toString the old way using queryParser:
    +(id:1^2.0 id:2 ... ) +type:CORE

Query.toString the new way using BooleanQuery:
    +((id:1^2.0) (id:2) ... ) +type:CORE

Does anyone have any ideas as to why the performance difference in
searching. I would think I could get the search time to be the same,
since after the query is formed everything remains the same. The same
number of hits are going through the hitCollector that gets called after
this, so all variables seem constant. QueryParser must be doing some
optimization I'm not taking advantage of. Code snippet follows.....

The previous way using queryParser was:

  String queryStr = (id:1^2 OR id:2 .... id:n) AND type:CORE
  Query query = parser.parse(queryStr);


The new way using BooleanQuery is...

  BooleanQuery totalBooleanQuery = new BooleanQuery();
  BooleanClause docTypeClause = 
    new BooleanClause( new TermQuery( 
    new Term("type", "CORE")), BooleanClause.Occur.MUST);

  BooleanQuery idBooleanQuery = new BooleanQuery();
  for (String uid : uniqueIdMap.keySet()) {
	TermQuery termQuery = 
         new TermQuery(new Term("id", uid));
	termQuery.setBoost(loopBoostValue);
	idBooleanQuery.add(
         new BooleanClause(termQuery, BooleanClause.Occur.SHOULD));
  } 

  totalBooleanQuery.add(docTypeClause);
  totalBooleanQuery.add(idBooleanQuery, Occur.MUST); 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Boolean Query search performance

Posted by Karl Wettin <ka...@gmail.com>.

Beard, Brian skrev:
> I'm using lucene 2.2.0.
> 
> I'm in the process of re-writing some queries to build BooleanQueries
> instead of using query parser.
> 
> Bypassing query parser provides almost an order of magnitude improvement
> for very large queries, but then the search performance takes 20-30%
> longer.

Have you benchmarked the time spent creating the query and the time 
spent searching using the query seperatly? QueryParser can be sort of 
expensive. It also seems to me that you are creating a very large query 
given you add one criteria per entity id.

     karl



> I'm adding boost values as a way to sort (long story, no
> workaround I can see for now).
> 
> If I do a query.toString(), both queries give different results, which
> is probably a clue (additional paren's with the BooleanQuery)
> 
> Query.toString the old way using queryParser:
>     +(id:1^2.0 id:2 ... ) +type:CORE
> 
> Query.toString the new way using BooleanQuery:
>     +((id:1^2.0) (id:2) ... ) +type:CORE
> 
> Does anyone have any ideas as to why the performance difference in
> searching. I would think I could get the search time to be the same,
> since after the query is formed everything remains the same. The same
> number of hits are going through the hitCollector that gets called after
> this, so all variables seem constant. QueryParser must be doing some
> optimization I'm not taking advantage of. Code snippet follows.....
> 
> The previous way using queryParser was:
> 
>   String queryStr = (id:1^2 OR id:2 .... id:n) AND type:CORE
>   Query query = parser.parse(queryStr);
> 
> 
> The new way using BooleanQuery is...
> 
>   BooleanQuery totalBooleanQuery = new BooleanQuery();
>   BooleanClause docTypeClause = 
>     new BooleanClause( new TermQuery( 
>     new Term("type", "CORE")), BooleanClause.Occur.MUST);
> 
>   BooleanQuery idBooleanQuery = new BooleanQuery();
>   for (String uid : uniqueIdMap.keySet()) {
> 	TermQuery termQuery = 
>          new TermQuery(new Term("id", uid));
> 	termQuery.setBoost(loopBoostValue);
> 	idBooleanQuery.add(
>          new BooleanClause(termQuery, BooleanClause.Occur.SHOULD));
>   } 
> 
>   totalBooleanQuery.add(docTypeClause);
>   totalBooleanQuery.add(idBooleanQuery, Occur.MUST); 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Boolean Query search performance

Posted by Chris Hostetter <ho...@fucit.org>.

: > additional parens normally indicates that you are actually creating an
: > extra layer of BooleanQueries (ie: a BooleanQuery with only one clause for

: look here,
: parens will also be add is each term has a boost value larger than 1.0.

i think you are missreading that code.  the "needParens" variable adds 
parens arroudn the *entire* "this" BooleanQuery if "this" has a boost or a 
non 0 minShouldMatch value ... BooleanQuery.toString only adds parens 
arround individual clauses if those clauses are themselves boolean 
queries ... which is the point i was making.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Boolean Query search performance

Posted by Eric Th <bl...@gmail.com>.

2008/3/6, Chris Hostetter <ho...@fucit.org>:
>
>
> : If I do a query.toString(), both queries give different results, which
>
> : is probably a clue (additional paren's with the BooleanQuery)
> :
> : Query.toString the old way using queryParser:
> :     +(id:1^2.0 id:2 ... ) +type:CORE
> :
> : Query.toString the new way using BooleanQuery:
> :     +((id:1^2.0) (id:2) ... ) +type:CORE
>
>
> i didn't look too closely at the psuedo code you posted, but the
> additional parens normally indicates that you are actually creating an
> extra layer of BooleanQueries (ie: a BooleanQuery with only one clause for
> each term) ... but the rewrite method should optimize those away (even
> back in lucene 2.2) ... if you look at query.rewrite(reader).toString()
> then the queries *really* should be the same, if they aren't, then that
> may be your culprit.


look here,
parens will also be add is each term has a boost value larger than 1.0.

public String toString(String field) {
    StringBuffer buffer = new StringBuffer();
    boolean needParens=(getBoost() != 1.0) ||
(getMinimumNumberShouldMatch()>0) ;
    if (needParens) {
      buffer.append("(");
    }

    for (int i = 0 ; i < clauses.size(); i++) {
      BooleanClause c = (BooleanClause)clauses.get(i);
      if (c.isProhibited())
        buffer.append("-");
      else if (c.isRequired())
        buffer.append("+");

      Query subQuery = c.getQuery();
      if (subQuery instanceof BooleanQuery) {      // wrap sub-bools in
parens
        buffer.append("(");
        buffer.append(c.getQuery().toString(field));
        buffer.append(")");
      } else
        buffer.append(c.getQuery().toString(field));

      if (i != clauses.size()-1)
        buffer.append(" ");
    }

    if (needParens) {
      buffer.append(")");
    }

    if (getMinimumNumberShouldMatch()>0) {
      buffer.append('~');
      buffer.append(getMinimumNumberShouldMatch());
    }

    if (getBoost() != 1.0f)
    {
      buffer.append(ToStringUtils.boost(getBoost()));
    }

    return buffer.toString();
  }




-Hoss




---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Boolean Query search performance

Posted by Chris Hostetter <ho...@fucit.org>.

: If I do a query.toString(), both queries give different results, which
: is probably a clue (additional paren's with the BooleanQuery)
: 
: Query.toString the old way using queryParser:
:     +(id:1^2.0 id:2 ... ) +type:CORE
: 
: Query.toString the new way using BooleanQuery:
:     +((id:1^2.0) (id:2) ... ) +type:CORE

i didn't look too closely at the psuedo code you posted, but the 
additional parens normally indicates that you are actually creating an 
extra layer of BooleanQueries (ie: a BooleanQuery with only one clause for 
each term) ... but the rewrite method should optimize those away (even 
back in lucene 2.2) ... if you look at query.rewrite(reader).toString() 
then the queries *really* should be the same, if they aren't, then that 
may be your culprit.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org