You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2012/01/01 16:48:18 UTC

Re: Highlighting with prefix queries and maxBooleanClause

This may be the impetus for Hoss creating SOLR-2996.

I suspect this will go away if you use the correct
match-all-docs syntax, i.e. q=*:* rather than q=*

Hoss' suggestion in 2996 is to "do the right thing" with
q=*, but for now you need to use the right syntax.

But I'm not sure what highlighting will do when there's
nothing to highlight on (ie, no query terms to match
against your text field).

FWIW
Erick

On Fri, Dec 30, 2011 at 6:00 PM, Michael Lissner
<ml...@michaeljaylissner.com> wrote:
> This question has come up a few times, but I've yet to see a good solution.
>
> Basically, if I have highlighting turned on and do a query for q=*, I get an
> error that maxBooleanClauses has been exceeded. Granted, this is a silly
> query, but a user might do something similar. My expectation is that queries
> that work when highlighting is OFF should continue working when it is ON.
>
> What's the best solution for queries like this? Is it simply to catch the
> error and then up maxBooleanClauses? Or to turn off highlighting when this
> error occurs?
>
> Or am I doing something altogether wrong?
>
> This is the query I'm using to cause the error:
>    http://localhost:8983/solr/select/?q=*&start=0&rows=20&hl=true&hl.fl=text
>
> Changing hl to false makes the query go through.
>
> I'm using Solr 4.0.0-dev
>
> The traceback is:
>
> SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
> is set to 1024
>    at
> org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:68)
>    at
> org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:159)
>    at
> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:81)
>    at
> org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:114)
>    at
> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:155)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:144)
>    at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:384)
>    at
> org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
>    at
> org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
>    at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
>    at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:511)
>    at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:402)
>    at
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:121)
>    at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>    at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>    at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>    at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>    at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>    at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>    at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>    at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>    at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>    at org.mortbay.jetty.Server.handle(Server.java:326)
>    at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>    at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>    at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>    at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
> Thanks,
>
> Mike

Re: Highlighting with prefix queries and maxBooleanClause

Posted by Michael Lissner <ml...@michaeljaylissner.com>.
I switched over to using FastVectorHighlighting, and the problem with 
maxBooleanClause is resolved. I guess this is at the expense of having 
a larger index (since you have to enable termVectors, termPositions and 
termOffsets), but at least it's working.

Thanks for the help.

Mike

On Tue 03 Jan 2012 11:53:35 AM PST, Chris Hostetter wrote:
>
> : About bumping MaxBooleanQueries. You can certainly
> : bump it up, but it's a legitimate question whether the
> : user is well served by allowing that pattern as opposed
> : to requiring 2 or 3 leading characters. The assumption
>
> i think the root of the issue here is that when executing queries, really
> broad prefix queries like "q=*" generate constant score queries, so relaly
> broad prefix queries are "safe" to execute.  but (based on his error) it
> seems like the highlighter fails loudly an painfully on these otherwise
> "safe" queries.
>
> understandably, part of the reason this happens is that the highlighter
> needs to know all the terms that that prefix expands to in order to know
> what to highlight, but the fact that it generates an error when
> maxBooleanClause is hit seems unfortunate -- maybe there is no way arround
> it, but i *thought* there were options that could be used related to
> highlighting to mitigate these issues, i just couldn't remember what they
> are (does the FastVectorHighlighter have these problems? is it only if you
> use WeightedSpanTermExtractor?) and hence my suggestion to Michael to
> start a thread here in the hopes that the highlighting experts (Yeah Koji!
> ... better you then me!) would chime in.
>
>
> -Hoss

Re: Highlighting with prefix queries and maxBooleanClause

Posted by Chris Hostetter <ho...@fucit.org>.
: About bumping MaxBooleanQueries. You can certainly
: bump it up, but it's a legitimate question whether the
: user is well served by allowing that pattern as opposed
: to requiring 2 or 3 leading characters. The assumption

i think the root of the issue here is that when executing queries, really 
broad prefix queries like "q=*" generate constant score queries, so relaly 
broad prefix queries are "safe" to execute.  but (based on his error) it 
seems like the highlighter fails loudly an painfully on these otherwise 
"safe" queries.

understandably, part of the reason this happens is that the highlighter 
needs to know all the terms that that prefix expands to in order to know 
what to highlight, but the fact that it generates an error when 
maxBooleanClause is hit seems unfortunate -- maybe there is no way arround 
it, but i *thought* there were options that could be used related to 
highlighting to mitigate these issues, i just couldn't remember what they 
are (does the FastVectorHighlighter have these problems? is it only if you 
use WeightedSpanTermExtractor?) and hence my suggestion to Michael to 
start a thread here in the hopes that the highlighting experts (Yeah Koji! 
... better you then me!) would chime in.


-Hoss

Re: Highlighting with prefix queries and maxBooleanClause

Posted by Erick Erickson <er...@gmail.com>.
About bumping MaxBooleanQueries. You can certainly
bump it up, but it's a legitimate question whether the
user is well served by allowing that pattern as opposed
to requiring 2 or 3 leading characters. The assumption
behind the maxBooleanClause restriction is that
when there get to be that many clauses, it's both
costly in terms of processing and ultimately not
much use to the search. YMMV of course, but
a* will probably match lots of terms in every document.

Best
Erick

On Sun, Jan 1, 2012 at 4:12 PM, Michael Lissner
<ml...@michaeljaylissner.com> wrote:
> On 01/01/2012 07:48 AM, Erick Erickson wrote:
>>
>> This may be the impetus for Hoss creating SOLR-2996.
>
> Yep, it is indeed, though I believe this problem can also happen when a user
> searches for something like q=a* in a big index. I need a bigger index to
> know for sure about that, but from what I've read so far, I'm fairly certain
> that this problem is bigger than just the q=* search.
>
> I think my solution when this error is thrown is going to be to bump the
> size of the maxBooleanClause and retry the query. Failing that, I'll have to
> retry the query with highlighting off.
>
>> I suspect this will go away if you use the correct
>> match-all-docs syntax, i.e. q=*:* rather than q=*
>
> It does, yes.
>
>> But I'm not sure what highlighting will do when there's
>> nothing to highlight on (ie, no query terms to match
>> against your text field).
>
> I believe it does nothing, thankfully.
>
> Mike

Re: Highlighting with prefix queries and maxBooleanClause

Posted by Michael Lissner <ml...@michaeljaylissner.com>.
On 01/01/2012 07:48 AM, Erick Erickson wrote:
> This may be the impetus for Hoss creating SOLR-2996.
Yep, it is indeed, though I believe this problem can also happen when a 
user searches for something like q=a* in a big index. I need a bigger 
index to know for sure about that, but from what I've read so far, I'm 
fairly certain that this problem is bigger than just the q=* search.

I think my solution when this error is thrown is going to be to bump the 
size of the maxBooleanClause and retry the query. Failing that, I'll 
have to retry the query with highlighting off.
> I suspect this will go away if you use the correct
> match-all-docs syntax, i.e. q=*:* rather than q=*
It does, yes.
> But I'm not sure what highlighting will do when there's
> nothing to highlight on (ie, no query terms to match
> against your text field).
I believe it does nothing, thankfully.

Mike