You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ken Stanley <do...@gmail.com> on 2010/11/02 15:14:22 UTC

Highlighting and maxBooleanClauses limit

By default, the solrconfig.xml has maxBooleanClauses set to 1024, which in
my opinion should be more than enough clauses in general. Recently, we have
been noticing errors in our Catalina log: SEVERE:
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set
to 2048. As a temporary (and quick) work around, we tried to increase the
maxBooleanClauses to 2048, but are still experiencing problems hitting the
limit. The full error (including the query ran before the error) is:

INFO: [bizjournals] webapp=/solr path=/select/
params={facet=true&sort=df_date_published+asc&hl=true&version=2.2&facet.field=facet_type&facet.field=facet_author&facet.field=facet_arr_industries&fq=df_date_published:[*+TO+NOW]&hl.requireFieldMatch=true&hl.fragsize=75&facet.mincount=1&indent=on&hl.fl=df_text_content&wt=xml&rows=25&hl.snippets=2&hl.maxAlternateFieldLength=150&start=0&q=(df_text_blog_name:"farm+bill")+OR+((df_text_headline:[*+TO+*]+AND+df_date_published:[*+TO+NOW])+AND+((df_text_author:"farm+bill")+OR+(df_text_content:"farm+bill")+OR+(df_text_headline:"farm+bill")+OR+(df_text_blog_name:"farm+bill")))&hl.alternateField=df_text_content&hl.usePhraseHighlighter=true}
hits=269 status=500 QTime=729
Nov 2, 2010 4:10:09 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount
is set to 2048
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
        at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:144)
        at
org.apache.lucene.search.MultiTermQuery$ScoringBooleanQueryRewrite.rewrite(MultiTermQuery.java:110)
        at
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:382)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:178)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)
        at
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
        at
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
        at
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
        at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
        at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
        at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
        at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
        at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:619)

I've noticed in the stack trace that this exception occurs when trying to
build the query for the highlighting; I've confirmed this by copying the
params and changing hl=true to hl=false. Unfortunately, when using
debugQuery=on, I do not see any details on what is going on with the
highlighting portion of the query (after artificially increasing the
maxBooleanClauses so the query will run).

With all of that said, my question(s) to the list are: Is there a way to
determine how exactly the highlighter is building its query (i.e., some sort
of highlighting debug setting)? Is the behavior of highlighting in SOLR
intended to be held to the same restrictions (maxBooleanClauses) as the
query parser (even though the highlighting query is built internally)?

I am not a SOLR expert by any measure of the word, and as such, I just don't
understand how two words on one field (as noted by the use of
hl.fl=df_text_content + hl.requireFieldMatch=true +
hl.usePhraseHighlighter=true) could somehow exceed the limits of both 1024
and 2048. I am concerned that even if I continue increasing
maxBooleanClauses, I am not actually solving anything; in fact, my concern
is that if I were to keep increasing this limit, I am in fact begging for
problems later on down the road.

For the sake of completeness, here are the definitions of the field I'm
highlighting on (schema.xml):

        <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitO
nCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true" />
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitO
nCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt" />
            </analyzer>
        </fieldType>

        <dynamicField name="df_text_*" type="text" indexed="true"
stored="true" />

    <solrQueryParser defaultOperator="OR" />

And here is my highlighter definition (solrconfig.xml):

    <highlighting>
        <!-- Configure the standard fragmenter -->
        <!-- This could most likely be commented out in the "default" case
-->
        <fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
            <lst name="defaults">
                <int name="hl.fragsize">255</int>
            </lst>
        </fragmenter>

        <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
        <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
            <lst name="defaults">
                <!-- slightly smaller fragsizes work better because of slop
-->
                <int name="hl.fragsize">70</int>
                <!-- allow 50% slop on fragment sizes -->
                <float name="hl.regex.slop">0.5</float>
                <!-- a basic sentence pattern -->
                  <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
            </lst>
        </fragmenter>

        <!-- Configure the standard formatter -->
        <formatter name="html"
class="org.apache.solr.highlight.HtmlFormatter" default="true">
            <lst name="defaults">
                <str name="hl.simple.pre"><![CDATA[<em>]]></str>
                <str name="hl.simple.post"><![CDATA[</em>]]></str>
            </lst>
        </formatter>
    </highlighting>

It is worth noting that I have not done anything (except formatting) to the
highlighting configuration in solrconfig.xml. Any help, assistance, and/or
guidance that can be provided would be greatly appreciated.

Thank you,

Ken Stanley

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"

Re: Highlighting and maxBooleanClauses limit

Posted by Markus Jelsma <ma...@openindex.io>.

Hmm, i'm not sure it's the highlighter alone. Depending on the query it can 
also get triggered by the spellcheck component. See below what happens with a 
maxBoolean = 16.

HTTP ERROR: 500

maxClauseCount is set to 16

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
16
	at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:153)
	at org.apache.lucene.search.spell.SpellChecker.add(SpellChecker.java:329)
	at 
org.apache.lucene.search.spell.SpellChecker.suggestSimilar(SpellChecker.java:260)
	at 
org.apache.solr.spelling.AbstractLuceneSpellChecker.getSuggestions(AbstractLuceneSpellChecker.java:140)
	at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:140)
	at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
	at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
	at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
	at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
	at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
	at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
	at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
	at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
	at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
	at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)



On Tuesday 02 November 2010 16:26:00 Koji Sekiguchi wrote:
> (10/11/02 23:14), Ken Stanley wrote:
> > I've noticed in the stack trace that this exception occurs when trying to
> > build the query for the highlighting; I've confirmed this by copying the
> > params and changing hl=true to hl=false. Unfortunately, when using
> > debugQuery=on, I do not see any details on what is going on with the
> > highlighting portion of the query (after artificially increasing the
> > maxBooleanClauses so the query will run).
> > 
> > With all of that said, my question(s) to the list are: Is there a way to
> > determine how exactly the highlighter is building its query (i.e., some
> > sort of highlighting debug setting)?
> 
> Basically I think highlighter uses main query, but try to rewrite it
> before highlighting.
> 
> > Is the behavior of highlighting in SOLR
> > intended to be held to the same restrictions (maxBooleanClauses) as the
> > query parser (even though the highlighting query is built internally)?
> 
> I think so because maxBooleanClauses is a static variable.
> 
> I saw your stack trace and glance at highlighter source,
> my assumption is - highlighter tried to rewrite (expand) your
> range queries to boolean query, even if you set requireFieldMatch to true.
> 
> Can you try to query without the range query? If the problem goes away,
> I think it is highlighter bug. Highlighter should skip the range query
> when user set requireFieldMatch to true, because your range query is for
> another field. If so, please open a jira issue.
> 
> Koji

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: Highlighting and maxBooleanClauses limit

Posted by Ken Stanley <do...@gmail.com>.

On Tue, Nov 2, 2010 at 11:26 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> (10/11/02 23:14), Ken Stanley wrote:
>
>> I've noticed in the stack trace that this exception occurs when trying to
>> build the query for the highlighting; I've confirmed this by copying the
>> params and changing hl=true to hl=false. Unfortunately, when using
>> debugQuery=on, I do not see any details on what is going on with the
>> highlighting portion of the query (after artificially increasing the
>> maxBooleanClauses so the query will run).
>>
>> With all of that said, my question(s) to the list are: Is there a way to
>> determine how exactly the highlighter is building its query (i.e., some
>> sort
>> of highlighting debug setting)?
>>
>
> Basically I think highlighter uses main query, but try to rewrite it
> before highlighting.
>
>
>  Is the behavior of highlighting in SOLR
>> intended to be held to the same restrictions (maxBooleanClauses) as the
>> query parser (even though the highlighting query is built internally)?
>>
>
> I think so because maxBooleanClauses is a static variable.
>
> I saw your stack trace and glance at highlighter source,
> my assumption is - highlighter tried to rewrite (expand) your
> range queries to boolean query, even if you set requireFieldMatch to true.
>
> Can you try to query without the range query? If the problem goes away,
> I think it is highlighter bug. Highlighter should skip the range query
> when user set requireFieldMatch to true, because your range query is for
> another field. If so, please open a jira issue.
>
> Koji
> --
> http://www.rondhuit.com/en/
>

Koji, that is most excellent. Thank you for pointing out that the range
queries were causing the highlighter to exceed the maxBooleanClauses. Once I
removed them from my main query (and moved them into separate filter
queries), SOLR and highlighting worked as I expected them to work.

Per your suggestion, I have opened a JIRA ticket (SOLR-2216) for this
problem. I am somewhat a novice at Java, and I have not yet had the pleasure
of getting the SOLR sources in my working environment, but I would be more
than eager to potentially assist in finding a solution - with maybe some
mentoring from a more experienced developer.

Anyway, thank you again, I am very excited to have a suitable work around
for the time being.

- Ken Stanley

Re: Highlighting and maxBooleanClauses limit

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

(10/11/02 23:14), Ken Stanley wrote:
> I've noticed in the stack trace that this exception occurs when trying to
> build the query for the highlighting; I've confirmed this by copying the
> params and changing hl=true to hl=false. Unfortunately, when using
> debugQuery=on, I do not see any details on what is going on with the
> highlighting portion of the query (after artificially increasing the
> maxBooleanClauses so the query will run).
>
> With all of that said, my question(s) to the list are: Is there a way to
> determine how exactly the highlighter is building its query (i.e., some sort
> of highlighting debug setting)?

Basically I think highlighter uses main query, but try to rewrite it
before highlighting.

> Is the behavior of highlighting in SOLR
> intended to be held to the same restrictions (maxBooleanClauses) as the
> query parser (even though the highlighting query is built internally)?

I think so because maxBooleanClauses is a static variable.

I saw your stack trace and glance at highlighter source,
my assumption is - highlighter tried to rewrite (expand) your
range queries to boolean query, even if you set requireFieldMatch to true.

Can you try to query without the range query? If the problem goes away,
I think it is highlighter bug. Highlighter should skip the range query
when user set requireFieldMatch to true, because your range query is for
another field. If so, please open a jira issue.

Koji
-- 
http://www.rondhuit.com/en/