You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "xing@mac.com" <xi...@mac.com> on 2005/07/11 07:45:35 UTC
BooleanQuery$TooManyClauses
Did a google serach on the problem when using the range search phrase of
"+datefield:[199801 TO 200512]" (date stored as "YYYYMMDD") which
returns 1 million hits.
error: org.apache.lucene.search.BooleanQuery$TooManyClauses
Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option allowed
the search query to run without error. The actual value needed is
between 2300 and 2400. At 2300 the query fails.
My question is how does Lucene perform range query? As a bunch of
smaller boolean queries? How does one estimate the number of clauses
required for a general query and more specifically on a range query?
Thanks.
Xing Li
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: BooleanQuery$TooManyClauses
Posted by "xing@mac.com" <xi...@mac.com>.
2500 vs 84. Wow. That's quite a few OR statements I would be saving
following your guide of just indexing the parts of the datetime I plan
to search on. Every ms count.
Now I have a clear picture of how range query works. Great stuff. Thanks.
Btw, coming from a db background I'm so used to writing queries in the
fashion where I put the most distinct comparison statement, the one
likely to return the least number of rows, first, in the where
statement. Db can still be pretty dumb with bad statistics and choose
the wrong execution plan so I like optimize for them when all possible
and force the issue.
If I have a sample lucene query:
"+a:abc +b:cde +d:bbd +date:[2001 TO 2005] -e:noway"
Does Lucene's execution engine try to figure out via statistics,
guesstimate, which path to take first? Or does it just go brute force
and follow the execution plan from left to right? Or does it just do all
of them individually, not executing the next search on the results of
the prior, and then ORing them at the end?
Xing
Erik Hatcher wrote:
>
> On Jul 11, 2005, at 1:45 AM, xing@mac.com wrote:
>
>> Did a google serach on the problem when using the range search phrase
>> of "+datefield:[199801 TO 200512]" (date stored as "YYYYMMDD") which
>> returns 1 million hits.
>>
>> error: org.apache.lucene.search.BooleanQuery$TooManyClauses
>>
>> Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option
>> allowed the search query to run without error. The actual value
>> needed is between 2300 and 2400. At 2300 the query fails.
>>
>> My question is how does Lucene perform range query? As a bunch of
>> smaller boolean queries? How does one estimate the number of clauses
>> required for a general query and more specifically on a range query?
>
>
> RangeQuery expands under the covers to a BooleanQuery with all matching
> terms OR'd together.
>
> In your case, if you've indexed a term for every day in that range
> using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31
> (minus 31 because you'd omit December '05 since you are only going to
> 200512). If all you need is YYYYMM range searching, then index it as
> that (that'd be 7 years * 12 months/year = 84 terms).
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: BooleanQuery$TooManyClauses
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 11, 2005, at 1:45 AM, xing@mac.com wrote:
> Did a google serach on the problem when using the range search
> phrase of "+datefield:[199801 TO 200512]" (date stored as
> "YYYYMMDD") which returns 1 million hits.
>
> error: org.apache.lucene.search.BooleanQuery$TooManyClauses
>
> Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option
> allowed the search query to run without error. The actual value
> needed is between 2300 and 2400. At 2300 the query fails.
>
> My question is how does Lucene perform range query? As a bunch of
> smaller boolean queries? How does one estimate the number of
> clauses required for a general query and more specifically on a
> range query?
RangeQuery expands under the covers to a BooleanQuery with all
matching terms OR'd together.
In your case, if you've indexed a term for every day in that range
using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31
(minus 31 because you'd omit December '05 since you are only going to
200512). If all you need is YYYYMM range searching, then index it as
that (that'd be 7 years * 12 months/year = 84 terms).
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org