You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2012/11/28 17:52:53 UTC
Solr4 - new characters requiring backslash escape?
I've been putting a new Solr 4.1 deployment through extensive testing
before we upgrade from 3.5.
My testing has turned up two characters that used to work fine with no
escaping that now give syntax errors without a preceding backslash.
Those characters are forward slash and apostrophe (single quote).
Is there a canonical list of characters that require backslashes to work
properly? I know about some already, such as parens and brackets.
On the same subject, but far less important because with SolrJ I don't
think I have to worry about it, is there a list of characters that
require URL encoding (%XX)?
Thanks,
Shawn
Re: Solr4 - new characters requiring backslash escape?
Posted by Chris Hostetter <ho...@fucit.org>.
: Here's the query showing the apostrophe problem, pulled from our search logs:
: q=( (MEXICO DAY OF THE DEAD CELEBRATION 'TRADITIONS OF LIFE AND DAY'))
:
: This is the error msg I can see in my browser when I send that to Solr
: 4.1-SNAPSHOT from 2012/11/26. I am not doing any testing with the 4.0.0
: release:
Definitely a bug, but it doesn't affect 4.0 -- it's a result of the
changes made in SOLR-4093...
https://issues.apache.org/jira/browse/SOLR-4121
Thanks for reporting this.
-Hoss
Re: Solr4 - new characters requiring backslash escape?
Posted by Shawn Heisey <so...@elyograg.org>.
On 11/28/2012 10:16 AM, Jack Krupansky wrote:
> Forward slash is now reserved for regular expression terms.
>
> For the full list, see the Javadoc, here:
> http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters
>
>
> I don't know of any change related to apostrophe. That may be token
> filter-specific. Maybe related to possessive removal. In what context
> is it a problem?
Thanks, that javadoc will be very helpful.
Here's the query showing the apostrophe problem, pulled from our search
logs:
q=( (MEXICO DAY OF THE DEAD CELEBRATION 'TRADITIONS OF LIFE AND DAY'))
This is the error msg I can see in my browser when I send that to Solr
4.1-SNAPSHOT from 2012/11/26. I am not doing any testing with the 4.0.0
release:
org.apache.solr.search.SyntaxError: Cannot parse '( (MEXICO DAY OF THE
DEAD CELEBRATION 'TRADITIONS OF LIFE AND DAY'))': Encountered "
<SQUOTED> "\'TRADITIONS OF LIFE AND DAY\' "" at line 1, column 41.
Was expecting one of:
<AND> ...
<OR> ...
<NOT> ...
"+" ...
"-" ...
<BAREOPER> ...
"(" ...
")" ...
"*" ...
"^" ...
<QUOTED> ...
<TERM> ...
<FUZZY_SLOP> ...
<PREFIXTERM> ...
<WILDTERM> ...
<REGEXPTERM> ...
"[" ...
"{" ...
<LPARAMS> ...
<NUMBER> ...
The same query against 3.5.0 works. When I add backslashes to single
quotes for the query against either version, it works. The escaped query
has the same numFound on both 3.5 and 4.1 as the unescaped query does
against 3.5. Escaped query:
q=( (MEXICO DAY OF THE DEAD CELEBRATION \'TRADITIONS OF LIFE AND DAY\'))
Is this a change that's new in 4.1, or is it perhaps a bug?
Thanks,
Shawn
Re: Solr4 - new characters requiring backslash escape?
Posted by Jack Krupansky <ja...@basetechnology.com>.
Forward slash is now reserved for regular expression terms.
For the full list, see the Javadoc, here:
http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters
I don't know of any change related to apostrophe. That may be token
filter-specific. Maybe related to possessive removal. In what context is it
a problem?
-- Jack Krupansky
-----Original Message-----
From: Shawn Heisey
Sent: Wednesday, November 28, 2012 11:52 AM
To: solr-user@lucene.apache.org
Subject: Solr4 - new characters requiring backslash escape?
I've been putting a new Solr 4.1 deployment through extensive testing
before we upgrade from 3.5.
My testing has turned up two characters that used to work fine with no
escaping that now give syntax errors without a preceding backslash.
Those characters are forward slash and apostrophe (single quote).
Is there a canonical list of characters that require backslashes to work
properly? I know about some already, such as parens and brackets.
On the same subject, but far less important because with SolrJ I don't
think I have to worry about it, is there a list of characters that
require URL encoding (%XX)?
Thanks,
Shawn