You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/04/15 18:41:00 UTC

[jira] [Commented] (SOLR-13336) solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can result in exponential expansion of naive queries

    [ https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818267#comment-16818267 ] 

ASF subversion and git services commented on SOLR-13336:
--------------------------------------------------------

Commit 59a3c45d9cc1a338c3dffbe5e7bd996a8e0dd37a in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=59a3c45 ]

SOLR-13336: add maxBooleanClauses (default to 1024) setting to solr.xml, reverting previous effective value of Integer.MAX_VALUE-1, to restrict risk of pathalogical query expansion.

(cherry picked from commit d90034f0d61cd1525e10d07cf064a8647dc08cc9)


> solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can result in exponential expansion of naive queries
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13336
>                 URL: https://issues.apache.org/jira/browse/SOLR-13336
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.0, 8.0
>            Reporter: Michael Gibney
>            Assignee: Hoss Man
>            Priority: Major
>             Fix For: 8.1
>
>         Attachments: SOLR-13336.patch, SOLR-13336.patch, SOLR-13336.patch
>
>
> changes made in Solr 7.0 set the effective value of {{BoleanQuery.getMaxClauseCount}} to {{Integer.MAX_VALUE-1}} and only impossed a restriction based on the (existing) solrconfig.xml setting  at the Solr query parser level via a new utility helper method.l
> But this means programatically generated queries (either by low level lucene methods, or by query re-writing) no longer had any safety valve to prevent (effectively) infinite expansion.  This issue fixes this problem by:
> * restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
> * introducing a new solr.xml level setting for configuring this upper bound:{noformat}
> <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
> {noformat}
> *NOTE* that this solr.xml limit is ahard upper bound, that superceeds the existing solrconfig.xml setting, which has been left in place and still limits the size of user specified boolean queries.  ie: solr.xml maxBooleanClauses >= solrconfig.xml maxBooleanClauses >= number of clauses a user explicitly specifies in a query string; solr.xml maxBooleanClauses >= numberr of clauses in an expanded/rewritten query
> {panel:title=original bug report}
> Since SOLR-10921 it appears that Solr always sets {{BooleanQuery.maxClauseCount}} (at the Lucene level) to {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses {{maxBooleanClauses}} out of the config and applies it externally.
> In any case, when used as part of {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?), the Lucene code checks internally against only the static {{maxClauseCount}} variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), {{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's underlying the [issue reported here as being related to Solr 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely severity, assuming no actual _malicious_ attack) if:
>  # Running Solr >= 7.6.0
>  # Using edismax with "ps" param set to >0
>  # Query-time analysis chain is _at all_ capable of producing graphs (e.g., WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time {{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only increased the likelihood of problems manifesting (as a result of LUCENE-8531). Notably, the "enumerated strings" approach to graph phrase query (reintroduced by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This explains the odd "Affects versions" => maxBooleanClauses was disabled at the Lucene level (in Solr contexts) starting with version 7.0, but the change became more likely to manifest problems for users as of 7.6.
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org