You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dilip Nimkar (JIRA)" <ji...@apache.org> on 2007/02/13 20:51:05 UTC
[jira] Created: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
----------------------------------------------------------------------------------------------------
Key: LUCENE-800
URL: https://issues.apache.org/jira/browse/LUCENE-800
Project: Lucene - Java
Issue Type: Bug
Components: QueryParser
Reporter: Dilip Nimkar
Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
TEST CODE:
Analyzer analyzer = new WhitespaceAnalyzer();
String[] queryStrs = {"item:\\\\",
"item:\\\\*",
"(item:\\\\ item:ABCD\\\\))",
"(item:\\\\ item:ABCD\\\\)"};
for (String queryStr : queryStrs) {
System.out.println("--------------------------------------");
System.out.println("String queryStr = " + queryStr);
Query luceneQuery = null;
try {
luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
} catch (Exception e) {
System.out.println(e.getClass().toString());
}
}
OUTPUT (with remarks in comment notation:)
--------------------------------------
String queryStr = item:\\
luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
--------------------------------------
String queryStr = item:\\*
luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
--------------------------------------
String queryStr = (item:\\ item:ABCD\\))
luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
--------------------------------------
String queryStr = (item:\\ item:ABCD\\)
class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475461 ]
Michael Busch commented on LUCENE-800:
--------------------------------------
Doron,
the problem here is that a backslash is a valid TERM_CHAR and an ESCAPE_CHAR at the same time. The fix is to exclude \ from the TERM_CHAR list. I tried this fix and it works fine for me. I'm going to attach a patch today. Would be great if you could review it before I commit it, Doron!
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch updated LUCENE-800:
---------------------------------
Attachment: Lucene-800.patch
With this patch a query like
(item:\\ item:ABCD\\)
does not throw a ParseException anymore. I excluded the backslash from the TERM_CHAR list, because a backslash should always be escaped.
I also changed the list ESCAPED_CHAR. Every character that follows a backslash should be considered as escaped. Until now, the query \a would cause a ParseException, the query \+ would work fine, which is not consistent. So now every character following a backslash is an ESCAPED_CHAR. Any objections?
All unit tests pass.
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
> Priority: Minor
> Attachments: Lucene-800.patch
>
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Assigned: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch reassigned LUCENE-800:
------------------------------------
Assignee: Michael Busch
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475452 ]
Doron Cohen commented on LUCENE-800:
------------------------------------
Michael, I've been looking into this and think I made some progress. Are you just starting, or do you have it solved already?
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Dilip Nimkar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475513 ]
Dilip Nimkar commented on LUCENE-800:
-------------------------------------
In my test code, I took care of the difference between \ as the Java escape character and \ as the Lucene escape character.
System.out.println(new QueryParser("_default_", analyzer).parse( "item:\\\\")) //note the 4 backslashes.
should print on the console item:\\
But it is printing item:\
Same is the case with the second string in the test code.
in general, the boolean test
str.equals(QueryParser("_default_", analyzer).parse( str).toString())
should always evaluate to true if the analyzer is not changing the string. But in our case it is evaluating to false.
The behavior I have consitently found is that - "Whenever and wherever a java String contains an unbroken sequence of N escaped backslashes (that is, N pairs of unescaped backslashes, totalling 2N backslashes) where N>= 2, the parse() method creates a Query that has only n-1 escaped backslashes in the corresponding place. " If you have 20 escaped backslashes in a java string, the Lucene query will end up with 19.
Thank you much for your time, attention and efforts.
Thanks.
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475529 ]
Michael Busch commented on LUCENE-800:
--------------------------------------
Dilip,
are you using Lucene 1.9? The problem you are referring to (a sequence of N escaped backslashes) has been fixed in Lucene 2.1:
http://issues.apache.org/jira/browse/LUCENE-573
Could you test your code with the new version, please?
However, the two other problems you pointed out and which I talked about in my previous comment are still there (but I'm working on it ;))
Thanks,
Michael
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch updated LUCENE-800:
---------------------------------
Priority: Minor (was: Major)
just lowering the severity to minor
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
> Priority: Minor
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Updated: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-800:
-------------------------------
Attachment: Lucene-800-more-tests.patch
Hi Michael, I reviewed this fix and it looks good and correct.
All tests are passing, including the new ones. (well, a few backwards compatibility tests fail - I would check that later - but it is unrelated to this fix).
While reviewing I added a few test cases just to make sure - attached Lucene-800-more-tests.patch in case you find that worthy to add.
Regards,
Doron
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
> Priority: Minor
> Attachments: Lucene-800-more-tests.patch, Lucene-800.patch
>
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Resolved: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Busch resolved LUCENE-800.
----------------------------------
Resolution: Fixed
Fix Version/s: 2.2
Thanks Doron for reviewing and for the additional tests!
I just committed this and LUCENE-372. Together these patches fix the two problems descibed in this issue.
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
> Priority: Minor
> Fix For: 2.2
>
> Attachments: Lucene-800-more-tests.patch, Lucene-800.patch
>
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
[jira] Commented: (LUCENE-800) Incorrect parsing by
QueryParser.parse() when it encounters backslashes (always eats one
backslash.)
Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475457 ]
Michael Busch commented on LUCENE-800:
--------------------------------------
Hi Dilip,
the backslash is the escape character in Lucene's queryparser syntax. So if you want to search for a backslash you have to escape it. That means that the first two examples you provides are working as expected:
item:\\ -> item:\ is correct
item:\\* -> item:\* is correct too
If you want to search for two backslashes you have to escape both, meaning you have to put four backslashes in the query string:
item:\\\\* -> item:\\*
But you indeed found two other problems. You are right, the last example should not throw a ParseException.
In (item:\\ item:ABCD\\) the queryparser falsely thinks that the closing parenthesis is escaped, but actually the backslash is the escaped character. I will provide a patch for this problem soon.
And as you said the third example should throw a ParseException because there are too many closing parenthesis. There is already a patch for this problem in JIRA:
http://issues.apache.org/jira/browse/LUCENE-372
I will commit fixes for both problems soon.
Thanks again, Dilip! Good catches :-)
> Incorrect parsing by QueryParser.parse() when it encounters backslashes (always eats one backslash.)
> ----------------------------------------------------------------------------------------------------
>
> Key: LUCENE-800
> URL: https://issues.apache.org/jira/browse/LUCENE-800
> Project: Lucene - Java
> Issue Type: Bug
> Components: QueryParser
> Reporter: Dilip Nimkar
> Assigned To: Michael Busch
>
> Test code and output follow. Tested Lucene 1.9 version only. Affects hose who would index/search for Lucene's reserved characters.
> Description: When an input search string has a sequence of N (java-escaped) backslashes, where N >= 2, the QueryParser will produce a query in which that sequence has N-1 backslashes.
> TEST CODE:
> Analyzer analyzer = new WhitespaceAnalyzer();
> String[] queryStrs = {"item:\\\\",
> "item:\\\\*",
> "(item:\\\\ item:ABCD\\\\))",
> "(item:\\\\ item:ABCD\\\\)"};
> for (String queryStr : queryStrs) {
> System.out.println("--------------------------------------");
> System.out.println("String queryStr = " + queryStr);
> Query luceneQuery = null;
> try {
> luceneQuery = new QueryParser("_default_", analyzer).parse(queryStr);
> System.out.println("luceneQuery.toString() = " + luceneQuery.toString());
> } catch (Exception e) {
> System.out.println(e.getClass().toString());
> }
> }
> OUTPUT (with remarks in comment notation:)
> --------------------------------------
> String queryStr = item:\\
> luceneQuery.toString() = item:\ //One backslash has disappeared. Searcher will fail on this query.
> --------------------------------------
> String queryStr = item:\\*
> luceneQuery.toString() = item:\* //One backslash has disappeared. This query will search for something unintended.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\))
> luceneQuery.toString() = item:\ item:ABCD\) //This should have thrown a ParseException because of an unescaped ')'. It did not.
> --------------------------------------
> String queryStr = (item:\\ item:ABCD\\)
> class org.apache.lucene.queryParser.ParseException //...and this one should not have, but it did.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org