You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dilip Nimkar (JIRA)" <ji...@apache.org> on 2007/02/13 20:51:05 UTC

[jira] Created: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
----------------------------------------------------------------------------------------------------

                 Key: LUCENE-800
                 URL: https://issues.apache.org/jira/browse/LUCENE-800
             Project: Lucene - Java
          Issue Type: Bug
          Components: QueryParser
            Reporter: Dilip Nimkar


Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.

Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.

TEST CODE:
    Analyzer analyzer = new WhitespaceAnalyzer();
    String[] queryStrs = {"item:\\\\",
                          "item:\\\\*",
                          "(item:\\\\ item:ABCD\\\\))",
                          "(item:\\\\ item:ABCD\\\\)"};
    for (String queryStr : queryStrs) {
      System.out.println("--------------------------------------");
      System.out.println("String queryStr = " + queryStr);
      Query luceneQuery = null;
      try {
        luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
        System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
      } catch (Exception e) {
        System.out.println(e.getClass().toString());
      }
    }

OUTPUT (with remarks in comment notation:) 
--------------------------------------
String queryStr = item:\\
luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
--------------------------------------
String queryStr = item:\\*
luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
--------------------------------------
String queryStr = (item:\\ item:ABCD\\))
luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
--------------------------------------
String queryStr = (item:\\ item:ABCD\\)
class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475461 ] 

Michael Busch commented on LUCENE-800:
--------------------------------------

Doron,

the problem here is that a backslash is a valid TERM_CHAR and an ESCAPE_CHAR at the same time. The fix is to exclude \ from the TERM_CHAR list. I tried this fix and it works fine for me. I'm going to attach a patch today. Would be great if you could review it before I commit it, Doron!



> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-800:
---------------------------------

    Attachment: Lucene-800.patch

With this patch a query like
(item:\\ item:ABCD\\)
does not throw a ParseException anymore. I excluded the backslash from the TERM_CHAR list, because a backslash should always be escaped. 

I also changed the list ESCAPED_CHAR. Every character that follows a backslash should be considered as escaped. Until now, the query \a would cause a ParseException, the query \+ would work fine, which is not consistent. So now every character following a backslash is an ESCAPED_CHAR. Any objections?

All unit tests pass.

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>            Priority: Minor
>         Attachments: Lucene-800.patch
>
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch reassigned LUCENE-800:
------------------------------------

    Assignee: Michael Busch

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475452 ] 

Doron Cohen commented on LUCENE-800:
------------------------------------

Michael, I've been looking into this and think I made some progress. Are you just starting, or do you have it solved already?

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Dilip Nimkar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475513 ] 

Dilip Nimkar commented on LUCENE-800:
-------------------------------------

In my test code, I took care of the difference between \ as the Java escape character and \ as the Lucene escape character.   

    System.out.println(new QueryParser("_default_", analyzer).parse( "item:\\\\"))    //note the 4 backslashes.
                 should print on the console item:\\
                 But it is printing item:\
    Same is the case with the second string in the test code.

    in general, the boolean test
       str.equals(QueryParser("_default_", analyzer).parse( str).toString())
     should always evaluate to true if the analyzer is not changing the string. But in our case it is evaluating to false.

The behavior I have consitently found is that - "Whenever and wherever a java String contains an unbroken sequence of N escaped backslashes (that is, N  pairs of unescaped backslashes, totalling 2N backslashes) where N>= 2, the parse() method creates a Query that has only n-1 escaped backslashes in the corresponding place. " If you have 20 escaped backslashes in a java string, the Lucene query will end up with 19.

Thank you much for your time, attention and efforts.
Thanks.
   

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475529 ] 

Michael Busch commented on LUCENE-800:
--------------------------------------

Dilip,

are you using Lucene 1.9? The problem you are referring to (a sequence of N escaped backslashes) has been fixed in Lucene 2.1:
http://issues.apache.org/jira/browse/LUCENE-573

Could you test your code with the new version, please?

However, the two other problems you pointed out and which I talked about in my previous comment are still there (but I'm working on it ;))

Thanks,
Michael


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-800:
---------------------------------

    Priority: Minor  (was: Major)

just lowering the severity to minor

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>            Priority: Minor
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-800:
-------------------------------

    Attachment: Lucene-800-more-tests.patch

Hi Michael, I reviewed this fix and it looks good and correct. 
All tests are passing, including the new ones. (well, a few backwards compatibility tests fail - I would check that later - but it is unrelated to this fix).
While reviewing I added a few test cases just to make sure - attached Lucene-800-more-tests.patch in case you find that worthy to add.
Regards,
Doron


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>            Priority: Minor
>         Attachments: Lucene-800-more-tests.patch, Lucene-800.patch
>
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-800.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2

Thanks Doron for reviewing and for the additional tests!

I just committed this and LUCENE-372. Together these patches fix the two problems descibed in this issue.

> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: Lucene-800-more-tests.patch, Lucene-800.patch
>
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-800) Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475457 ] 

Michael Busch commented on LUCENE-800:
--------------------------------------

Hi Dilip,

the backslash is the escape character in Lucene's queryparser syntax. So if you want to search for a backslash you have to escape it. That means that the first two examples you provides are working as expected:

item:\\ -> item:\ is correct
item:\\* -> item:\* is correct too

If you want to search for two backslashes you have to escape both, meaning you have to put four backslashes in the query string:
item:\\\\* -> item:\\*


But you indeed found two other problems. You are right, the last example should not throw a ParseException. 
In (item:\\ item:ABCD\\) the queryparser falsely thinks that the closing parenthesis is escaped, but actually the backslash is the escaped character. I will provide a patch for this problem soon.

And as you said the third example should throw a ParseException because there are too many closing parenthesis. There is already a patch for this problem in JIRA: 
http://issues.apache.org/jira/browse/LUCENE-372

I will commit fixes for both problems soon. 

Thanks again, Dilip! Good catches :-)


> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-800
>                 URL: https://issues.apache.org/jira/browse/LUCENE-800
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: QueryParser
>            Reporter: Dilip Nimkar
>         Assigned To: Michael Busch
>
> Test code and output follow. Tested  Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
>     Analyzer analyzer = new WhitespaceAnalyzer();
>     String[] queryStrs = {"item:\\\\",
>                           "item:\\\\*",
>                           "(item:\\\\ item:ABCD\\\\))",
>                           "(item:\\\\ item:ABCD\\\\)"};
>     for (String queryStr : queryStrs) {
>       System.out.println("--------------------------------------");
>       System.out.println("String queryStr = " + queryStr);
>       Query luceneQuery = null;
>       try {
>         luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
>         System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
>       } catch (Exception e) {
>         System.out.println(e.getClass().toString());
>       }
>     }
> OUTPUT (with remarks in comment notation:) 
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\             //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\*           //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\)     //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException        //...and this one should not have, but it did.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org