You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Benny <be...@gmail.com> on 2005/09/03 02:18:56 UTC

RangQuery problem.

Hi,

I hit a problem when using RangQuery.

I inexed a price field. It works well when query short price range. 

For example,

Query works fine for price: from 100000 - 200000
But if changing to price range: from 1000-200000. There are some exceptions 
throwed out. I past the logs below. If some one knows what happed, please 
help.

Thanks a lot.

Benny

----- Root Cause -----
org.apache.lucene.search.BooleanQuery$TooManyClauses
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:147)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:175)
at org.apache.lucene.search.Query.weight(Query.java:92)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:165)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
LuceneQueryOptimizer.java:128)
at org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:86)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:244)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:92)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
at org.apache.jasper.servlet.JspServletWrapper.service(
JspServletWrapper.java:162)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240)

Re: RangQuery problem.

Posted by Benny <be...@gmail.com>.
Also, I check the tomcat log "catalina.out", the one with less range works 
fine. 


050903 142317 17 query request from 68.237.38.140 <http://68.237.38.140>
050903 142317 17 query: price:f10000t20000
050903 142317 17 searching for 20 raw hits
050903 142317 17 re-searching for 40 raw hits, query: price:"f10000t20000" 
-site:"realestate.nytimes.com <http://realestate.nytimes.com>"
050903 142317 17 found 11 raw hits
050903 142317 17 total hits: 2043
050903 142426 18 query request from 68.237.38.140 <http://68.237.38.140>
050903 142426 18 query: price:f1000t20000
050903 142426 18 searching for 20 raw hits


On 9/3/05, Benny <be...@gmail.com> wrote:
> 
> Hi Matthias,
> 
> When I use run
> 
> nutch org.apache.nutch.searcher.Query
> 
> Query: price:f100000t200000
> Parsed: price:"f100000t200000"
> Translated: +price:[0000100000 TO 0000200000]
> Query: price:f10000t200000
> Parsed: price:"f10000t200000"
> Translated: +price:[0000010000 TO 0000200000]
> 
> I think the problem is in other place. is it possible to the price string 
> is too long? Or maybe range query is not suite for too many pages(>50K)?
> 
> More hints what may happen?
> 
> Thanks.
> 
> Benny
> 
> 
> 
> 
> 
> 
> On 9/3/05, Matthias Jaekle <ja...@eventax.de> wrote:
> > 
> > Hi Benny,
> > 
> > I could not tell you anything about your failure, but maybe there is an
> > other one. Did you consider, that lucene uses text comparisons.
> > So, maybe you should always compare 001000 with 200000. Strings with the 
> > 
> > same length.
> > 
> > Matthias
> > 
> > 
> > Benny schrieb:
> > > Hi,
> > >
> > > I hit a problem when using RangQuery.
> > >
> > > I inexed a price field. It works well when query short price range.
> > >
> > > For example, 
> > >
> > > Query works fine for price: from 100000 - 200000
> > > But if changing to price range: from 1000-200000. There are some 
> > exceptions
> > > throwed out. I past the logs below. If some one knows what happed, 
> > please 
> > > help.
> > >
> > > Thanks a lot.
> > >
> > > Benny
> > >
> > > ----- Root Cause -----
> > > org.apache.lucene.search.BooleanQuery$TooManyClauses
> > > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java :147)
> > > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
> > > at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
> > > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
> > > at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
> > :175)
> > > at org.apache.lucene.search.Query.weight(Query.java:92)
> > > at org.apache.lucene.search.Searcher.createWeight(Searcher.java :165)
> > > at org.apache.lucene.search.Searcher.search(Searcher.java:126)
> > > at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
> > > LuceneQueryOptimizer.java:128)
> > > at org.apache.nutch.searcher.IndexSearcher.search (IndexSearcher.java
> > :86)
> > > at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)
> > > at org.apache.jsp.search_jsp._jspService(search_jsp.java:244)
> > > at org.apache.jasper.runtime.HttpJspBase.service (HttpJspBase.java:92)
> > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
> > > at org.apache.jasper.servlet.JspServletWrapper.service(
> > > JspServletWrapper.java:162)
> > > at org.apache.jasper.servlet.JspServlet.serviceJspFile (
> > JspServlet.java:240)
> > >
> > 
> > --
> > http://www.eventax.com - eventax GmbH
> > http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events 
> > 
> 
>

Re: RangQuery problem.

Posted by Benny <be...@gmail.com>.
Thanks a lot.

On 9/3/05, Andrzej Bialecki <ab...@getopt.org> wrote:
> 
> Benny wrote:
> 
> > Query: price:f100000t200000
> > Parsed: price:"f100000t200000"
> > Translated: +price:[0000100000 TO 0000200000]
> > Query: price:f10000t200000
> > Parsed: price:"f10000t200000"
> > Translated: +price:[0000010000 TO 0000200000]
> >
> > I think the problem is in other place. is it possible to the price 
> string is
> > too long? Or maybe range query is not suite for too many pages(>50K)?
> >
> > More hints what may happen?
> 
> No, the problem is explained in the stacktrace:
> 
> 
> >>>----- Root Cause -----
> >>>org.apache.lucene.search.BooleanQuery$TooManyClauses
> >>>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:147)
> >>>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
> >>>at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
> >>>at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
> >>>at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
> >>
> >>:175)
> >>
> >>>at org.apache.lucene.search.Query.weight(Query.java:92)
> >>>at org.apache.lucene.search.Searcher.createWeight(Searcher.java:165)
> >>>at org.apache.lucene.search.Searcher.search(Searcher.java:126)
> >>>at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
> >>>LuceneQueryOptimizer.java:128)
> >>>at org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java
> :86)
> >>>at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)
> 
> This exception is caused by the rewrite method in RangeQuery.rewrite.
> You see, range queries need to expand (rewrite) the range into a boolean
> query composed of all terms found in the index that fall between the
> range. In your first case, you had less than 1024 such terms. In your
> second case, you had more terms than that threshold. This threshold is
> checked in order to prevent range queries from exploding into all index
> terms and thus destroying search performance (or crashing the 
> application).
> 
> If you are confident that your setup can handle more terms in a query,
> then you can use BooleanQuery.setMaxClauseCount(xxx) to increase this 
> limit.
> 
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
> 
>

Re: RangQuery problem.

Posted by Andrzej Bialecki <ab...@getopt.org>.
Benny wrote:

> Query: price:f100000t200000
> Parsed: price:"f100000t200000"
> Translated: +price:[0000100000 TO 0000200000]
> Query: price:f10000t200000
> Parsed: price:"f10000t200000"
> Translated: +price:[0000010000 TO 0000200000]
> 
> I think the problem is in other place. is it possible to the price string is 
> too long? Or maybe range query is not suite for too many pages(>50K)?
> 
> More hints what may happen?

No, the problem is explained in the stacktrace:


>>>----- Root Cause -----
>>>org.apache.lucene.search.BooleanQuery$TooManyClauses
>>>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:147)
>>>at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
>>>at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
>>>at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
>>>at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
>>
>>:175)
>>
>>>at org.apache.lucene.search.Query.weight(Query.java:92)
>>>at org.apache.lucene.search.Searcher.createWeight(Searcher.java:165)
>>>at org.apache.lucene.search.Searcher.search(Searcher.java:126)
>>>at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
>>>LuceneQueryOptimizer.java:128)
>>>at org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:86)
>>>at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)

This exception is caused by the rewrite method in RangeQuery.rewrite. 
You see, range queries need to expand (rewrite) the range into a boolean 
query composed of all terms found in the index that fall between the 
range. In your first case, you had less than 1024 such terms. In your 
second case, you had more terms than that threshold. This threshold is 
checked in order to prevent range queries from exploding into all index 
terms and thus destroying search performance (or crashing the application).

If you are confident that your setup can handle more terms in a query, 
then you can use BooleanQuery.setMaxClauseCount(xxx) to increase this limit.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: RangQuery problem.

Posted by Benny <be...@gmail.com>.
Hi Matthias,

When I use run

nutch org.apache.nutch.searcher.Query

Query: price:f100000t200000
Parsed: price:"f100000t200000"
Translated: +price:[0000100000 TO 0000200000]
Query: price:f10000t200000
Parsed: price:"f10000t200000"
Translated: +price:[0000010000 TO 0000200000]

I think the problem is in other place. is it possible to the price string is 
too long? Or maybe range query is not suite for too many pages(>50K)?

More hints what may happen?

Thanks.

Benny






On 9/3/05, Matthias Jaekle <ja...@eventax.de> wrote:
> 
> Hi Benny,
> 
> I could not tell you anything about your failure, but maybe there is an
> other one. Did you consider, that lucene uses text comparisons.
> So, maybe you should always compare 001000 with 200000. Strings with the
> same length.
> 
> Matthias
> 
> 
> Benny schrieb:
> > Hi,
> >
> > I hit a problem when using RangQuery.
> >
> > I inexed a price field. It works well when query short price range.
> >
> > For example,
> >
> > Query works fine for price: from 100000 - 200000
> > But if changing to price range: from 1000-200000. There are some 
> exceptions
> > throwed out. I past the logs below. If some one knows what happed, 
> please
> > help.
> >
> > Thanks a lot.
> >
> > Benny
> >
> > ----- Root Cause -----
> > org.apache.lucene.search.BooleanQuery$TooManyClauses
> > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:147)
> > at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
> > at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
> > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
> > at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java
> :175)
> > at org.apache.lucene.search.Query.weight(Query.java:92)
> > at org.apache.lucene.search.Searcher.createWeight(Searcher.java:165)
> > at org.apache.lucene.search.Searcher.search(Searcher.java:126)
> > at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
> > LuceneQueryOptimizer.java:128)
> > at org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:86)
> > at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)
> > at org.apache.jsp.search_jsp._jspService(search_jsp.java:244)
> > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:92)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
> > at org.apache.jasper.servlet.JspServletWrapper.service(
> > JspServletWrapper.java:162)
> > at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java
> :240)
> >
> 
> --
> http://www.eventax.com - eventax GmbH
> http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events
>

Re: RangQuery problem.

Posted by Matthias Jaekle <ja...@eventax.de>.
Hi Benny,

I could not tell you anything about your failure, but maybe there is an 
other one. Did you consider, that lucene uses text comparisons.
So, maybe you should always compare 001000 with 200000. Strings with the 
same length.

Matthias


Benny schrieb:
> Hi,
> 
> I hit a problem when using RangQuery.
> 
> I inexed a price field. It works well when query short price range. 
> 
> For example,
> 
> Query works fine for price: from 100000 - 200000
> But if changing to price range: from 1000-200000. There are some exceptions 
> throwed out. I past the logs below. If some one knows what happed, please 
> help.
> 
> Thanks a lot.
> 
> Benny
> 
> ----- Root Cause -----
> org.apache.lucene.search.BooleanQuery$TooManyClauses
> at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:147)
> at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:138)
> at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:92)
> at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:352)
> at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:175)
> at org.apache.lucene.search.Query.weight(Query.java:92)
> at org.apache.lucene.search.Searcher.createWeight(Searcher.java:165)
> at org.apache.lucene.search.Searcher.search(Searcher.java:126)
> at org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(
> LuceneQueryOptimizer.java:128)
> at org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:86)
> at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:208)
> at org.apache.jsp.search_jsp._jspService(search_jsp.java:244)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:92)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:809)
> at org.apache.jasper.servlet.JspServletWrapper.service(
> JspServletWrapper.java:162)
> at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:240)
> 

-- 
http://www.eventax.com - eventax GmbH
http://www.umkreisfinder.de - Die Suchmaschine für Lokales und Events