You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/11/25 09:12:46 UTC

[GitHub] [jackrabbit-oak] fabriziofortino commented on a change in pull request #424: Support allowLeadingWildcard for elastic

fabriziofortino commented on a change in pull request #424:
URL: https://github.com/apache/jackrabbit-oak/pull/424#discussion_r756667420



##########
File path: oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java
##########
@@ -787,7 +787,21 @@ private static QueryBuilder fullTextQuery(String text, String fieldName, PlanRes
             // and could contain other parts like renditions, node name, etc
             return multiMatchQuery.field(fieldName);
         } else {
-            return simpleQueryStringQuery(text).field(fieldName).defaultOperator(Operator.AND);
+            Boolean allowLeadingWildcard = false;
+            for (PropertyDefinition pd : pr.indexingRule.getProperties()) {
+                if (pd.name.equals(fieldName)) {
+                    allowLeadingWildcard = pd.allowLeadingWildcard;

Review comment:
       should we `break` out of the for here?

##########
File path: oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java
##########
@@ -787,7 +787,21 @@ private static QueryBuilder fullTextQuery(String text, String fieldName, PlanRes
             // and could contain other parts like renditions, node name, etc
             return multiMatchQuery.field(fieldName);
         } else {
-            return simpleQueryStringQuery(text).field(fieldName).defaultOperator(Operator.AND);
+            Boolean allowLeadingWildcard = false;

Review comment:
       this should be a primitive

##########
File path: oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/query/ElasticRequestHandler.java
##########
@@ -787,7 +787,21 @@ private static QueryBuilder fullTextQuery(String text, String fieldName, PlanRes
             // and could contain other parts like renditions, node name, etc
             return multiMatchQuery.field(fieldName);
         } else {
-            return simpleQueryStringQuery(text).field(fieldName).defaultOperator(Operator.AND);
+            Boolean allowLeadingWildcard = false;
+            for (PropertyDefinition pd : pr.indexingRule.getProperties()) {
+                if (pd.name.equals(fieldName)) {
+                    allowLeadingWildcard = pd.allowLeadingWildcard;
+                }
+            }
+            // https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
+            // simpleQueryStringQuery does not support leading wildcards whereas it's supported by default in queryStringQuery
+            // Setting this to true can have performance impact, hence it's false by default in oak implementation for elastic,
+            // However clients can set it to true from property definition based on case by case usage.
+            if (allowLeadingWildcard) {
+                return queryStringQuery(text).field(fieldName).defaultOperator(Operator.AND);
+            } else {
+                return simpleQueryStringQuery(text).analyzeWildcard(true).field(fieldName).defaultOperator(Operator.AND);
+            }

Review comment:
       I could not find any doc on `allowLeadingWildcard `. With these changes, we can allow/disallow it for specific fields.  Does it mean that for full-text queries (eg: `contains(. ,"*foo*")` this would never work?

##########
File path: oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/FulltextIndexConstants.java
##########
@@ -401,4 +401,11 @@ public static IndexingMode from(String indexingMode) {
      * cost). The value is: nodes, the path. For properties, the path of the node, then '@' property.
      */
     String USE_IF_EXISTS = "useIfExists";
+
+    /*
+    When set to true for any particular property in an index, queries would support leading wild cards in searches.
+    This is supported in lucene implementation by default, this configuration is primarily meant for elastic index
+    and would be NOOP in lucene index definitions.
+     */

Review comment:
       to make it consistent with the other comments
   ```suggestion
       /**
        * When set to true for any particular property in an index, queries would support leading wild cards in searches.
        * This is supported in lucene implementation by default, this configuration is primarily meant for elastic index
        * and would be NOOP in lucene index definitions.
        */
   ```

##########
File path: oak-search-elastic/src/test/java/org/apache/jackrabbit/oak/plugins/index/elastic/ElasticFullTextAsyncTest.java
##########
@@ -70,6 +71,62 @@ public void fullTextQuery() throws Exception {
         });
     }
 
+    @Test
+    public void fullTextQueryTestAllowLeadingWildcards() throws Exception {
+        IndexDefinitionBuilder builder = createIndex("propa");
+        builder.async("async");
+        builder.indexRule("nt:base").property("propa").analyzed().allowLeadingWildcard();
+
+        String indexId = UUID.randomUUID().toString();
+        setIndex(indexId, builder);
+        root.commit();
+
+        //add content
+        Tree test = root.getTree("/").addChild("test");
+
+        test.addChild("a").setProperty("propa", "ship_to_canada");
+        test.addChild("b").setProperty("propa", "steamship_to_canada");
+        test.addChild("c").setProperty("propa", "ship_to_can");
+        root.commit();
+        //Thread.sleep(5000);

Review comment:
       ```suggestion
   ```

##########
File path: oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/FulltextIndexConstants.java
##########
@@ -401,4 +401,11 @@ public static IndexingMode from(String indexingMode) {
      * cost). The value is: nodes, the path. For properties, the path of the node, then '@' property.
      */
     String USE_IF_EXISTS = "useIfExists";
+
+    /*
+    When set to true for any particular property in an index, queries would support leading wild cards in searches.
+    This is supported in lucene implementation by default, this configuration is primarily meant for elastic index
+    and would be NOOP in lucene index definitions.
+     */
+    String PROP_ALLOW_LEADING_WILDCARDS = "allowLeadingWildCards";

Review comment:
       Fix camelcase. Moreover, I see that here we use plural `allowLeadingWildcards` while in `PropertyDefinition` we use singular `allowLeadingWildcard`. Can we make it consistent?
   
   ```suggestion
       String PROP_ALLOW_LEADING_WILDCARDS = "allowLeadingWildcards";
   ```

##########
File path: oak-search-elastic/src/test/java/org/apache/jackrabbit/oak/plugins/index/elastic/ElasticFullTextAsyncTest.java
##########
@@ -70,6 +71,62 @@ public void fullTextQuery() throws Exception {
         });
     }
 
+    @Test
+    public void fullTextQueryTestAllowLeadingWildcards() throws Exception {
+        IndexDefinitionBuilder builder = createIndex("propa");
+        builder.async("async");
+        builder.indexRule("nt:base").property("propa").analyzed().allowLeadingWildcard();
+
+        String indexId = UUID.randomUUID().toString();
+        setIndex(indexId, builder);
+        root.commit();
+
+        //add content
+        Tree test = root.getTree("/").addChild("test");
+
+        test.addChild("a").setProperty("propa", "ship_to_canada");
+        test.addChild("b").setProperty("propa", "steamship_to_canada");
+        test.addChild("c").setProperty("propa", "ship_to_can");
+        root.commit();
+        //Thread.sleep(5000);
+
+        String query = "//*[jcr:contains(@propa, '*ship to can*')] ";
+
+        assertEventually(() -> {
+            assertThat(explain(query, XPATH), containsString("elasticsearch:" + indexId));
+            assertQuery(query, XPATH, Arrays.asList("/test/a", "/test/b", "/test/c"));
+        });
+    }
+
+    @Test
+    public void fullTextQueryTestAllowLeadingWildcardsDefault() throws Exception {
+        IndexDefinitionBuilder builder = createIndex("propa");
+        builder.async("async");
+        builder.indexRule("nt:base").property("propa").analyzed();
+
+        String indexId = UUID.randomUUID().toString();
+        setIndex(indexId, builder);
+        root.commit();
+
+        //add content
+        Tree test = root.getTree("/").addChild("test");
+
+        test.addChild("a").setProperty("propa", "ship_to_canada");
+        test.addChild("b").setProperty("propa", "steamship_to_canada");
+        test.addChild("c").setProperty("propa", "ship_to_can");
+        root.commit();
+        //Thread.sleep(5000);

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org