You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2019/04/08 19:24:00 UTC

[jira] [Assigned] (SOLR-13336) maxBooleanClauses ignored; can result in exponential expansion of naive queries

     [ https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man reassigned SOLR-13336:
-------------------------------

      Assignee: Hoss Man
    Attachment: SOLR-13336.patch

As cassandra mentioned above, I think the only viable way to "fix" this is to replace the current hardcoded...
{code:java}
BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE-1);
{code}
...introduced by SOLR-10921 with a new {{solr.xml}} setting...
{code:java}
if (null != this.cfg.getBooleanQueryMaxClauseCount()) {
  BooleanQuery.setMaxClauseCount(this.cfg.getBooleanQueryMaxClauseCount());
}
{code}
The attached patch:
 * adds {{maxBooleanClauses}} as a new global optional {{solr.xml}} setting
 ** does *NOT* add any new hardcoded default – instead it simply defers to the {{BooleanQuery.getMaxClauseCount()}} default
 * preserves the existing use of {{solrconfig.xml}}'s {{<maxBooleanClauses>}} as a "per-collection" upper bound on the number of clauses in an _explicit/externally created_ BooleanQuery (as introduced in SOLR-10921)
 ** logs a warning if the {{solrconfig.xml}} value for {{<maxBooleanClauses>}} exceeds the global {{maxBooleanClauses}}
 * adds a new "softcoded" default to the solr.xml shipped with solr, allowing the same sysprop already used in the {{_default}} configset's {{solrconfig.xml}} to control both limits at the same time...
{noformat}
<!-- solr.xml -->
  <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>

<!-- solrconfig.xml -->
    <maxBooleanClauses>${solr.max.booleanClauses:1024}</maxBooleanClauses>
{noformat}

The outstanding nocommits are related to:
 * updating the ref-guide to explain the two options and how they relate
 * updating the comments explaining {{<maxBooleanClauses>}} in {{solrconfig.xml}} to mention the solr.xml setting as an upper bound
 * whether anyone wants to bikeshed over the topic of solr having a hardcoded default for global {{maxBooleanClauses}} instead of using the existing hardcoded lucene default
 ** I'm not willing to add this – if someone else wants to they can update the patch themselves
 ** i only included these nocommits to draw attention to what should be changed if there is concensus on adding this.

----
If there are no objections to the approach in this patch, i'll move forward with updating the docs & config comments.

> maxBooleanClauses ignored; can result in exponential expansion of naive queries
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-13336
>                 URL: https://issues.apache.org/jira/browse/SOLR-13336
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.6, 7.0, master (9.0)
>            Reporter: Michael Gibney
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13336.patch
>
>
> Since SOLR-10921 it appears that Solr always sets {{BooleanQuery.maxClauseCount}} (at the Lucene level) to {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses {{maxBooleanClauses}} out of the config and applies it externally.
> In any case, when used as part of {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and possibly other places?), the Lucene code checks internally against only the static {{maxClauseCount}} variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), {{maxBooleanClauses}} is having no effect. I'm pretty sure this is what's underlying the [issue reported here as being related to Solr 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely severity, assuming no actual _malicious_ attack) if:
>  # Running Solr >= 7.6.0
>  # Using edismax with "ps" param set to >0
>  # Query-time analysis chain is _at all_ capable of producing graphs (e.g., WordDelimiterGraphFilter, SynonymGraphFilter that has corresponding synonyms with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time {{WordDelimiterGraphFilter}} configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only increased the likelihood of problems manifesting (as a result of LUCENE-8531). Notably, the "enumerated strings" approach to graph phrase query (reintroduced by LUCENE-8531) was previously in place pre-6.5 – at which point it could rely on default Lucene-level {{maxClauseCount}} failsafe (removed as of 7.0). This explains the odd "Affects versions" => maxBooleanClauses was disabled at the Lucene level (in Solr contexts) starting with version 7.0, but the change became more likely to manifest problems for users as of 7.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org