You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Nuno Santos (Jira)" <ji...@apache.org> on 2022/08/05 14:05:00 UTC

[jira] [Created] (OAK-9884) Optimization: use Elastic prefix queries instead of wildcard queries for 'like foo%" constraints

Nuno Santos created OAK-9884:
--------------------------------

             Summary: Optimization: use  Elastic prefix queries instead of wildcard queries for 'like foo%" constraints 
                 Key: OAK-9884
                 URL: https://issues.apache.org/jira/browse/OAK-9884
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
            Reporter: Nuno Santos


For a query like
{noformat}
select * from [nt:base] where [propa] like '12%'{noformat}
the Elastic module will generate a wildcard query of the form 
{noformat}
{"bool":{"filter":[{"wildcard":{"propa":{"value":"12*"}}}]}}{noformat}
But the Elastic query can be done more efficiently with
{noformat}
{"bool":{"filter":[{"prefix":{"propa":{"value":"12"}}}]}}{noformat}
That is, any like condition that has a single wilcard of type % at the end, can be rewritten as a prefix query.

The current code on trunk attempts to implement this optimization, but due to an off-by-one bug, the logic that would use prefix queries is never executed.  

[https://github.com/apache/jackrabbit-oak/blob/27a7a9ffa0e78e5b258626aef24066eb58efc559/oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java#L750-L771]

Also see OAK-9881 for a more detailed analysis. 
h3. Context

Elastic documentation suggests that wilcard queries are expensive:

[https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-wildcard-query.html]
h4. Allow expensive queries

Wildcard queries will not be executed if [{{search.allow_expensive_queries}}|https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html#query-dsl-allow-expensive-queries] is set to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)