You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2016/09/02 13:33:20 UTC

[jira] [Comment Edited] (OAK-4705) Fulltext parser doesn't allow stand-alone hyphen in search expression

    [ https://issues.apache.org/jira/browse/OAK-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458535#comment-15458535 ] 

Thomas Mueller edited comment on OAK-4705 at 9/2/16 1:32 PM:
-------------------------------------------------------------

> that hyphen followed by quoted string wasn't really doing the right this

You are right! There were not enough tests.

Your patch looks fine, but I would add even more tests, see my patch below. I wonder whether it would be better to support " \-"  (space before term: currently illegal), and whether it's a good idea to sort entries ("hello \- world" becomes "- hello world") and make them unique ("test test" becomes "test").

{noformat}
Index: src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java
===================================================================
--- src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java	(revision 1755600)
+++ src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java	(working copy)
@@ -85,10 +85,9 @@
         boolean not = false;
         StringBuilder buff = new StringBuilder();
         char c = text.charAt(parseIndex);
-        if (c == '-') {
-            if (++parseIndex >= text.length()) {
-                throw getSyntaxError("term");
-            }
+        if (c == '-' && parseIndex < text.length() - 1 &&
+                text.charAt(parseIndex + 1) != ' ') {
+            c = text.charAt(++parseIndex);
             not = true;
         }
         boolean escaped = false;
Index: src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java
===================================================================
--- src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java	(revision 1755600)
+++ src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java	(working copy)
@@ -54,6 +54,7 @@
         assertFalse(test("hello world", "world"));
         assertTrue(test("hello world", "world hello"));
         assertTrue(test("hello world ", "hello world"));
+        assertEquals("\"hello\"", convertPattern("hello hello"));
     }
 
     @Test
@@ -63,9 +64,35 @@
         assertTrue(test("hello OR world", "world"));
         assertFalse(test("hello OR world", "hi"));
     }
+    
+    @Test
+    public void minusLiteral() throws ParseException {
+        assertEquals("\"-\"", convertPattern("-"));
+        assertEquals("\"-\"", convertPattern("- "));
+        assertEquals("\"-\"", convertPattern("- -"));
+        assertEquals("\"-\" \"hello\"", convertPattern("- hello"));
+        assertEquals("\"-\" \"hello\" \"world\"", convertPattern("hello - world"));
+        assertEquals("\"-\" \"hello\" \"world\"", convertPattern("hello  -  world"));
+        assertEquals("\"-\" \"hello\"", convertPattern("hello -"));
+             
+        assertTrue(test("-", "hello - world"));
+        assertTrue(test("- ", "hello - world"));
+        assertTrue(test("- -", "hello - world"));
+        assertTrue(test("- hello", "hello - world"));
+        assertTrue(test("hello - world", "hello - world"));
+        assertTrue(test("hello - \"wonderful world\"", "hello - wonderful world"));
+        assertTrue(test("hello -", "hello -"));
+    }
 
     @Test
     public void not() throws ParseException {
+        assertEquals("\"hello\" -\"wonderful world\"", convertPattern("hello -\"wonderful world\""));
+        assertTrue(test("hello -\"wonderful world\"", "hello"));
+        assertTrue(test("hello -\"wonderful world\"", "hello wonderful"));
+        assertTrue(test("hello -\"wonderful world\"", "hello world"));
+        assertFalse(test("hello -\"wonderful world\"", "hello wonderful world"));
+        assertFalse(test("hello -\"wonderful world\"", "wonderful world"));
+        assertTrue(test("hello \"-wonderful world\"", "hello this beautiful -wonderful world"));
         assertEquals("\"hello\" -\"world\"", convertPattern("hello -world"));
         assertTrue(test("hello -world", "hello"));
         assertFalse(test("hello -world", "hello world"));
@@ -105,10 +132,10 @@
     @Test
     public void invalid() throws ParseException {
         testInvalid("", "(*); expected: term");
+        testInvalid(" x", " (*)x; expected: term");
+        testInvalid(" -", " (*)-; expected: term");
         testInvalid("x OR ", "x OR(*); expected: term");
         testInvalid("\"", "(*)\"; expected: double quote");
-        testInvalid("-", "(*)-; expected: term");
-        testInvalid("- x", "- (*)x; expected: term");
         testInvalid("\\", "(*)\\; expected: escaped char");
         testInvalid("\"\\", "\"(*)\\; expected: escaped char");
         testInvalid("\"x\"y", "\"x\"(*)y; expected: space");
{noformat}


was (Author: tmueller):
> that hyphen followed by quoted string wasn't really doing the right this

You are right! There were not enough tests.

Your patch looks fine, but I would add even more tests, see my patch below. I wonder whether it would be better to support " -"  (space before term: currently illegal), and whether it's a good idea to sort entries ("hello - world" becomes "- hello world") and make them unique ("test test" becomes "test").

{noformat}
Index: src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java
===================================================================
--- src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java	(revision 1755600)
+++ src/main/java/org/apache/jackrabbit/oak/query/fulltext/FullTextParser.java	(working copy)
@@ -85,10 +85,9 @@
         boolean not = false;
         StringBuilder buff = new StringBuilder();
         char c = text.charAt(parseIndex);
-        if (c == '-') {
-            if (++parseIndex >= text.length()) {
-                throw getSyntaxError("term");
-            }
+        if (c == '-' && parseIndex < text.length() - 1 &&
+                text.charAt(parseIndex + 1) != ' ') {
+            c = text.charAt(++parseIndex);
             not = true;
         }
         boolean escaped = false;
Index: src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java
===================================================================
--- src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java	(revision 1755600)
+++ src/test/java/org/apache/jackrabbit/oak/query/ast/FullTextTest.java	(working copy)
@@ -54,6 +54,7 @@
         assertFalse(test("hello world", "world"));
         assertTrue(test("hello world", "world hello"));
         assertTrue(test("hello world ", "hello world"));
+        assertEquals("\"hello\"", convertPattern("hello hello"));
     }
 
     @Test
@@ -63,9 +64,35 @@
         assertTrue(test("hello OR world", "world"));
         assertFalse(test("hello OR world", "hi"));
     }
+    
+    @Test
+    public void minusLiteral() throws ParseException {
+        assertEquals("\"-\"", convertPattern("-"));
+        assertEquals("\"-\"", convertPattern("- "));
+        assertEquals("\"-\"", convertPattern("- -"));
+        assertEquals("\"-\" \"hello\"", convertPattern("- hello"));
+        assertEquals("\"-\" \"hello\" \"world\"", convertPattern("hello - world"));
+        assertEquals("\"-\" \"hello\" \"world\"", convertPattern("hello  -  world"));
+        assertEquals("\"-\" \"hello\"", convertPattern("hello -"));
+             
+        assertTrue(test("-", "hello - world"));
+        assertTrue(test("- ", "hello - world"));
+        assertTrue(test("- -", "hello - world"));
+        assertTrue(test("- hello", "hello - world"));
+        assertTrue(test("hello - world", "hello - world"));
+        assertTrue(test("hello - \"wonderful world\"", "hello - wonderful world"));
+        assertTrue(test("hello -", "hello -"));
+    }
 
     @Test
     public void not() throws ParseException {
+        assertEquals("\"hello\" -\"wonderful world\"", convertPattern("hello -\"wonderful world\""));
+        assertTrue(test("hello -\"wonderful world\"", "hello"));
+        assertTrue(test("hello -\"wonderful world\"", "hello wonderful"));
+        assertTrue(test("hello -\"wonderful world\"", "hello world"));
+        assertFalse(test("hello -\"wonderful world\"", "hello wonderful world"));
+        assertFalse(test("hello -\"wonderful world\"", "wonderful world"));
+        assertTrue(test("hello \"-wonderful world\"", "hello this beautiful -wonderful world"));
         assertEquals("\"hello\" -\"world\"", convertPattern("hello -world"));
         assertTrue(test("hello -world", "hello"));
         assertFalse(test("hello -world", "hello world"));
@@ -105,10 +132,10 @@
     @Test
     public void invalid() throws ParseException {
         testInvalid("", "(*); expected: term");
+        testInvalid(" x", " (*)x; expected: term");
+        testInvalid(" -", " (*)-; expected: term");
         testInvalid("x OR ", "x OR(*); expected: term");
         testInvalid("\"", "(*)\"; expected: double quote");
-        testInvalid("-", "(*)-; expected: term");
-        testInvalid("- x", "- (*)x; expected: term");
         testInvalid("\\", "(*)\\; expected: escaped char");
         testInvalid("\"\\", "\"(*)\\; expected: escaped char");
         testInvalid("\"x\"y", "\"x\"(*)y; expected: space");
{noformat}

> Fulltext parser doesn't allow stand-alone hyphen in search expression
> ---------------------------------------------------------------------
>
>                 Key: OAK-4705
>                 URL: https://issues.apache.org/jira/browse/OAK-4705
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Minor
>
> Afaics, the fulltext grammar [spec|https://docs.adobe.com/content/docs/en/spec/jcr/2.0/6_Query.html#6.7.19%20FullTextSearch] \[0] allows a stand-alone hyphen to be a simple word.
> So, it seems all of these are valid:
> # {{A - B}} //hyphen being a separate word
> # {{A -}} //hyphen being a separate word at the end
> # {{A -B}} //hyphen working as negation operator on {{B}}
> # {{A -"B C"}} //hyphen working as negation operator on {{"B C"}}
> Currently, case 1 and 2 are treated as invalid.
> \[0]: https://docs.adobe.com/content/docs/en/spec/jcr/2.0/6_Query.html#6.7.19%20FullTextSearch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)