You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/11/12 23:19:11 UTC

[jira] [Updated] (LUCENE-6889) BooleanQuery.rewrite could easily optimize some simple cases

     [ https://issues.apache.org/jira/browse/LUCENE-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-6889:
---------------------------------
    Attachment: LUCENE-6889.patch

Here is a patch that does the following rewrites:

Removal of FILTER clauses that are also MUST clauses
{noformat}
#a +a -> +a
{noformat}

FilteredQuery rewrite when the query is a MatchAllDocsQuery
{noformat}
+*:*^b #f -> ConstantScoreQuery(f)^b
{noformat}

Removal of filters on MatchAllDocsQuery if they are a MUST clause as well
{noformat}
+a #*:* -> +a
{noformat}

Deduplication of FILTER and MUST_NOT clauses
{noformat}
+a #f #f -f -f -> +a #f -f
{noformat}

They have the nice property of being able to execute things that we used to execute as a disjunction or a conjunction as a simple term query.

I also wanted to rewrite queries to a MatchAllDocsQuery when there was an intersection between required and prohibited clauses (Terry's rule 3) or when the mininumShouldMatch is greater than the number of SHOULD clauses but this broke weight normalization. We can probably solve the MUST_NOT/MUST intersection at the Scorer level but I propose to defer it to another issue.

The patch includes unit tests for the above rewrite rules as well as a random test that makes sure that the same set of matches and scores are produced if no rewriting is performed.

> BooleanQuery.rewrite could easily optimize some simple cases
> ------------------------------------------------------------
>
>                 Key: LUCENE-6889
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6889
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6889.patch
>
>
> Follow-up of SOLR-8251: APIs and user interfaces sometimes encourage to write BooleanQuery instances that are not optimal, for instance a typical case that happens often with Solr/Elasticsearch is to send a request that has a MatchAllDocsQuery as a query and some filter, which could be executed more efficiently by directly wrapping the filter into a ConstantScoreQuery.
> Here are some ideas of rewrite operations that BooleanQuery could perform:
>  - remove FILTER clauses when they are also a MUST clause
>  - rewrite queries of the form "+*:* #filter" to a ConstantScoreQuery(filter)
>  - rewrite to a MatchNoDocsQuery when a clause that is a MUST or FILTER clause is also a MUST_NOT clause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org