You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Chesky <Bi...@learninga-z.com> on 2013/08/15 19:43:05 UTC

Question on wildcard queries, filters, scoring and TooManyClauses exception

Hello,

I know this is a perennial question here because I've spent a lot of time searching for an answer.  I've seen the discussions about the TooManyClauses exception and I understand generally why you get the it.  I see lots of discussion about using filters to avoid it but I still can't get it to work.  I think I'm just missing something fundamental.

I'm using Lucene 3.0.

I'm trying to do prefix queries on an index.  I figured there might be times where I might run into the TooManyClauses exception so from reading discussions on the issue I figured I should use a filter.  I found the PrefixFilter class and began experimenting with it.  E.g. this works:

QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
Query prefixQuery = queryParser.parse("t*");
PrefixFilter prefixFilter = new PrefixFilter(new Term("my_field", "t"));
indexSearcher.search(prefixQuery, prefixFilter, collector);

This returns about 5000 hits on my index.

But then I discovered that it works just as well without the filter:

QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
Query prefixQuery = queryParser.parse("t*");
indexSearcher.search(prefixQuery, collector);

Why, I don't know.  Seems like this would get expanded out into 5000 BooleanQueries and since my max clause count is still set to the default 1024 I should get the exception.  But I didn't.  So maybe I don't need the filter after all?

Next, I need scoring to work.  I read that with wildcard queries all scores are set to 1.0 by default.  But I read you can use the QueryParser.setMultiTermRewriteMethod() method to take scoring into account again.  So I tried:

QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
Query prefixQuery = queryParser.parse("t*");
indexSearcher.search(prefixQuery, collector);

Now, I get the TooManyClauses exception.

I tried adding the PrefixFilter back in but with no luck.  Still get the exception.

Again, sorry if this has been discussed before.  Just not seeing an answer to this after much searching and I just don't understand what is going on here.  Any help appreciated.  Links welcome.

Bill


Re: Question on wildcard queries, filters, scoring and TooManyClauses exception

Posted by Duke DAI <du...@gmail.com>.
Some share for this topic.

QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field",
new StandardAnalyzer(Version.LUCENE_30));
Query prefixQuery = queryParser.parse("t*");
indexSearcher.search(prefixQuery, collector);
MultiTermQuery.default(forgot the name) rewriter will be used, if expanded
term number exceed 350(?) it'll be converted Filter automatically.


QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field",
new StandardAnalyzer(Version.LUCENE_30));
queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_
BOOLEAN_QUERY_REWRITE);
Query prefixQuery = queryParser.parse("t*");
indexSearcher.search(prefixQuery, collector);
MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE will populate terms into
BooleanQuery(default 1024? to hit TooManyClauses exception), you can set a
big enough one by calling BooleanQuery.setMaxBooleanClause at start to
avoid the exception, but be careful about memory usage if you set too big
one.



Best regards,
Duke
If not now, when? If not me, who?


On Fri, Aug 16, 2013 at 10:24 PM, Bill Chesky
<Bi...@learninga-z.com>wrote:

> Thanks for the reply Ian.
>
> > I can't explain all of it and 3.0 is way old ... you might like to
> > think about upgrading.
>
> Yes, I agree but since there's a significant code base in place, it's a
> bigger project than I can take on at the moment.
>
> > However in your first snippet you don't need the query AND the filter.
> > Either one will suffice.  In some circumstances, as you say, filters
> > are preferable but queries and filters are often interchangeable.
>
> Yeah, that occurred to me too.  But how do I use only the filter?  The
> IndexSearcher class (in the version of Lucene I'm using anyway) does not
> have a search method that takes only a Filter.  The closest it has is:
>
> IndexSearcher.search(Weight, Filter, Collector)
>
> The documentation on the Weight parameter is kind of sparse.  In the
> javadoc for that method it just says:
>
> weight - to match documents
>
> Looking at the Weight class, it says to instantiate a Weight instance with
> Query.createWeight(Searcher).  So now I'm back to having to have a query
> again.
>
> > On the rest of it, I don't know what is going on.  What java classes
> > are you getting back from QueryParser?  Giving it a variable name of
> > prefixQuery doesn't make it so - what does
> > prefixQuery.getClass().getName() say?
>
> It's definitely returning a PrefixQuery.  I checked that early on.
>
> thanks again,
>
> Bill
>
> --
> Ian.
>
>
> On Thu, Aug 15, 2013 at 6:43 PM, Bill Chesky
> <Bi...@learninga-z.com> wrote:
> > Hello,
> >
> > I know this is a perennial question here because I've spent a lot of
> time searching for an answer.  I've seen the discussions about the
> TooManyClauses exception and I understand generally why you get the it.  I
> see lots of discussion about using filters to avoid it but I still can't
> get it to work.  I think I'm just missing something fundamental.
> >
> > I'm using Lucene 3.0.
> >
> > I'm trying to do prefix queries on an index.  I figured there might be
> times where I might run into the TooManyClauses exception so from reading
> discussions on the issue I figured I should use a filter.  I found the
> PrefixFilter class and began experimenting with it.  E.g. this works:
> >
> > QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field",
> new StandardAnalyzer(Version.LUCENE_30));
> > Query prefixQuery = queryParser.parse("t*");
> > PrefixFilter prefixFilter = new PrefixFilter(new Term("my_field", "t"));
> > indexSearcher.search(prefixQuery, prefixFilter, collector);
> >
> > This returns about 5000 hits on my index.
> >
> > But then I discovered that it works just as well without the filter:
> >
> > QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field",
> new StandardAnalyzer(Version.LUCENE_30));
> > Query prefixQuery = queryParser.parse("t*");
> > indexSearcher.search(prefixQuery, collector);
> >
> > Why, I don't know.  Seems like this would get expanded out into 5000
> BooleanQueries and since my max clause count is still set to the default
> 1024 I should get the exception.  But I didn't.  So maybe I don't need the
> filter after all?
> >
> > Next, I need scoring to work.  I read that with wildcard queries all
> scores are set to 1.0 by default.  But I read you can use the
> QueryParser.setMultiTermRewriteMethod() method to take scoring into account
> again.  So I tried:
> >
> > QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field",
> new StandardAnalyzer(Version.LUCENE_30));
> >
> queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
> > Query prefixQuery = queryParser.parse("t*");
> > indexSearcher.search(prefixQuery, collector);
> >
> > Now, I get the TooManyClauses exception.
> >
> > I tried adding the PrefixFilter back in but with no luck.  Still get the
> exception.
> >
> > Again, sorry if this has been discussed before.  Just not seeing an
> answer to this after much searching and I just don't understand what is
> going on here.  Any help appreciated.  Links welcome.
> >
> > Bill
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Question on wildcard queries, filters, scoring and TooManyClauses exception

Posted by Bill Chesky <Bi...@learninga-z.com>.
Thanks for the reply Ian.

> I can't explain all of it and 3.0 is way old ... you might like to
> think about upgrading.

Yes, I agree but since there's a significant code base in place, it's a bigger project than I can take on at the moment.

> However in your first snippet you don't need the query AND the filter.
> Either one will suffice.  In some circumstances, as you say, filters
> are preferable but queries and filters are often interchangeable.

Yeah, that occurred to me too.  But how do I use only the filter?  The IndexSearcher class (in the version of Lucene I'm using anyway) does not have a search method that takes only a Filter.  The closest it has is:

IndexSearcher.search(Weight, Filter, Collector)

The documentation on the Weight parameter is kind of sparse.  In the javadoc for that method it just says:

weight - to match documents

Looking at the Weight class, it says to instantiate a Weight instance with Query.createWeight(Searcher).  So now I'm back to having to have a query again.  

> On the rest of it, I don't know what is going on.  What java classes
> are you getting back from QueryParser?  Giving it a variable name of
> prefixQuery doesn't make it so - what does
> prefixQuery.getClass().getName() say?

It's definitely returning a PrefixQuery.  I checked that early on.

thanks again,

Bill

--
Ian.


On Thu, Aug 15, 2013 at 6:43 PM, Bill Chesky
<Bi...@learninga-z.com> wrote:
> Hello,
>
> I know this is a perennial question here because I've spent a lot of time searching for an answer.  I've seen the discussions about the TooManyClauses exception and I understand generally why you get the it.  I see lots of discussion about using filters to avoid it but I still can't get it to work.  I think I'm just missing something fundamental.
>
> I'm using Lucene 3.0.
>
> I'm trying to do prefix queries on an index.  I figured there might be times where I might run into the TooManyClauses exception so from reading discussions on the issue I figured I should use a filter.  I found the PrefixFilter class and began experimenting with it.  E.g. this works:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> Query prefixQuery = queryParser.parse("t*");
> PrefixFilter prefixFilter = new PrefixFilter(new Term("my_field", "t"));
> indexSearcher.search(prefixQuery, prefixFilter, collector);
>
> This returns about 5000 hits on my index.
>
> But then I discovered that it works just as well without the filter:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> Query prefixQuery = queryParser.parse("t*");
> indexSearcher.search(prefixQuery, collector);
>
> Why, I don't know.  Seems like this would get expanded out into 5000 BooleanQueries and since my max clause count is still set to the default 1024 I should get the exception.  But I didn't.  So maybe I don't need the filter after all?
>
> Next, I need scoring to work.  I read that with wildcard queries all scores are set to 1.0 by default.  But I read you can use the QueryParser.setMultiTermRewriteMethod() method to take scoring into account again.  So I tried:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
> Query prefixQuery = queryParser.parse("t*");
> indexSearcher.search(prefixQuery, collector);
>
> Now, I get the TooManyClauses exception.
>
> I tried adding the PrefixFilter back in but with no luck.  Still get the exception.
>
> Again, sorry if this has been discussed before.  Just not seeing an answer to this after much searching and I just don't understand what is going on here.  Any help appreciated.  Links welcome.
>
> Bill
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Question on wildcard queries, filters, scoring and TooManyClauses exception

Posted by Ian Lea <ia...@gmail.com>.
I can't explain all of it and 3.0 is way old ... you might like to
think about upgrading.

However in your first snippet you don't need the query AND the filter.
 Either one will suffice.  In some circumstances, as you say, filters
are preferable but queries and filters are often interchangeable.

On the rest of it, I don't know what is going on.  What java classes
are you getting back from QueryParser?  Giving it a variable name of
prefixQuery doesn't make it so - what does
prefixQuery.getClass().getName() say?


--
Ian.


On Thu, Aug 15, 2013 at 6:43 PM, Bill Chesky
<Bi...@learninga-z.com> wrote:
> Hello,
>
> I know this is a perennial question here because I've spent a lot of time searching for an answer.  I've seen the discussions about the TooManyClauses exception and I understand generally why you get the it.  I see lots of discussion about using filters to avoid it but I still can't get it to work.  I think I'm just missing something fundamental.
>
> I'm using Lucene 3.0.
>
> I'm trying to do prefix queries on an index.  I figured there might be times where I might run into the TooManyClauses exception so from reading discussions on the issue I figured I should use a filter.  I found the PrefixFilter class and began experimenting with it.  E.g. this works:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> Query prefixQuery = queryParser.parse("t*");
> PrefixFilter prefixFilter = new PrefixFilter(new Term("my_field", "t"));
> indexSearcher.search(prefixQuery, prefixFilter, collector);
>
> This returns about 5000 hits on my index.
>
> But then I discovered that it works just as well without the filter:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> Query prefixQuery = queryParser.parse("t*");
> indexSearcher.search(prefixQuery, collector);
>
> Why, I don't know.  Seems like this would get expanded out into 5000 BooleanQueries and since my max clause count is still set to the default 1024 I should get the exception.  But I didn't.  So maybe I don't need the filter after all?
>
> Next, I need scoring to work.  I read that with wildcard queries all scores are set to 1.0 by default.  But I read you can use the QueryParser.setMultiTermRewriteMethod() method to take scoring into account again.  So I tried:
>
> QueryParser queryParser = new QueryParser(Version.LUCENE_30, "my_field", new StandardAnalyzer(Version.LUCENE_30));
> queryParser.setMultiTermRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
> Query prefixQuery = queryParser.parse("t*");
> indexSearcher.search(prefixQuery, collector);
>
> Now, I get the TooManyClauses exception.
>
> I tried adding the PrefixFilter back in but with no luck.  Still get the exception.
>
> Again, sorry if this has been discussed before.  Just not seeing an answer to this after much searching and I just don't understand what is going on here.  Any help appreciated.  Links welcome.
>
> Bill
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org