You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "xing@mac.com" <xi...@mac.com> on 2005/07/11 07:45:35 UTC

BooleanQuery$TooManyClauses

Did a google serach on the problem when using the range search phrase of 
  "+datefield:[199801 TO 200512]" (date stored as "YYYYMMDD") which 
returns 1 million hits.

error: org.apache.lucene.search.BooleanQuery$TooManyClauses

Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option allowed 
the search query to run without error. The actual value needed is 
between 2300 and 2400. At 2300 the query fails.

My question is how does Lucene perform range query?  As a bunch of 
smaller boolean queries? How does one estimate the number of clauses 
required for a general query and more specifically on a range query?

Thanks.

Xing Li

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BooleanQuery$TooManyClauses

Posted by "xing@mac.com" <xi...@mac.com>.
2500 vs 84. Wow. That's quite a few OR statements I would be saving 
following your guide of just indexing the parts of the datetime I plan 
to search on. Every ms count.

Now I have a clear picture of how range query works. Great stuff. Thanks.

Btw, coming from a db background I'm so used to writing queries in the 
fashion where I put the most distinct comparison statement, the one 
likely to return the least number of rows, first, in the where 
statement. Db can still be pretty dumb with bad statistics and choose 
the wrong execution plan so I like optimize for them when all possible 
and force the issue.

If I have a sample lucene query:

"+a:abc +b:cde +d:bbd +date:[2001 TO 2005] -e:noway"

Does Lucene's execution engine try to figure out via statistics, 
guesstimate, which path to take first? Or does it just go brute force 
and follow the execution plan from left to right? Or does it just do all 
of them individually, not executing the next search on the results of 
the prior, and then ORing them at the end?

Xing



Erik Hatcher wrote:
> 
> On Jul 11, 2005, at 1:45 AM, xing@mac.com wrote:
> 
>> Did a google serach on the problem when using the range search  phrase 
>> of  "+datefield:[199801 TO 200512]" (date stored as  "YYYYMMDD") which 
>> returns 1 million hits.
>>
>> error: org.apache.lucene.search.BooleanQuery$TooManyClauses
>>
>> Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option  
>> allowed the search query to run without error. The actual value  
>> needed is between 2300 and 2400. At 2300 the query fails.
>>
>> My question is how does Lucene perform range query?  As a bunch of  
>> smaller boolean queries? How does one estimate the number of  clauses 
>> required for a general query and more specifically on a  range query?
> 
> 
> RangeQuery expands under the covers to a BooleanQuery with all  matching 
> terms OR'd together.
> 
> In your case, if you've indexed a term for every day in that range  
> using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31  
> (minus 31 because you'd omit December '05 since you are only going to  
> 200512).  If all you need is YYYYMM range searching, then index it as  
> that (that'd be 7 years * 12 months/year = 84 terms).
> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BooleanQuery$TooManyClauses

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 11, 2005, at 1:45 AM, xing@mac.com wrote:

> Did a google serach on the problem when using the range search  
> phrase of  "+datefield:[199801 TO 200512]" (date stored as  
> "YYYYMMDD") which returns 1 million hits.
>
> error: org.apache.lucene.search.BooleanQuery$TooManyClauses
>
> Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option  
> allowed the search query to run without error. The actual value  
> needed is between 2300 and 2400. At 2300 the query fails.
>
> My question is how does Lucene perform range query?  As a bunch of  
> smaller boolean queries? How does one estimate the number of  
> clauses required for a general query and more specifically on a  
> range query?

RangeQuery expands under the covers to a BooleanQuery with all  
matching terms OR'd together.

In your case, if you've indexed a term for every day in that range  
using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31  
(minus 31 because you'd omit December '05 since you are only going to  
200512).  If all you need is YYYYMM range searching, then index it as  
that (that'd be 7 years * 12 months/year = 84 terms).

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org