You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Nuno Santos (Jira)" <ji...@apache.org> on 2022/08/05 14:05:00 UTC
[jira] [Created] (OAK-9884) Optimization: use Elastic prefix queries instead of wildcard queries for 'like foo%" constraints
Nuno Santos created OAK-9884:
--------------------------------
Summary: Optimization: use Elastic prefix queries instead of wildcard queries for 'like foo%" constraints
Key: OAK-9884
URL: https://issues.apache.org/jira/browse/OAK-9884
Project: Jackrabbit Oak
Issue Type: Improvement
Components: indexing
Reporter: Nuno Santos
For a query like
{noformat}
select * from [nt:base] where [propa] like '12%'{noformat}
the Elastic module will generate a wildcard query of the form
{noformat}
{"bool":{"filter":[{"wildcard":{"propa":{"value":"12*"}}}]}}{noformat}
But the Elastic query can be done more efficiently with
{noformat}
{"bool":{"filter":[{"prefix":{"propa":{"value":"12"}}}]}}{noformat}
That is, any like condition that has a single wilcard of type % at the end, can be rewritten as a prefix query.
The current code on trunk attempts to implement this optimization, but due to an off-by-one bug, the logic that would use prefix queries is never executed.
[https://github.com/apache/jackrabbit-oak/blob/27a7a9ffa0e78e5b258626aef24066eb58efc559/oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java#L750-L771]
Also see OAK-9881 for a more detailed analysis.
h3. Context
Elastic documentation suggests that wilcard queries are expensive:
[https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-wildcard-query.html]
h4. Allow expensive queries
Wildcard queries will not be executed if [{{search.allow_expensive_queries}}|https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl.html#query-dsl-allow-expensive-queries] is set to false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)