You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <so...@elyograg.org> on 2012/11/28 17:52:53 UTC

Solr4 - new characters requiring backslash escape?

I've been putting a new Solr 4.1 deployment through extensive testing 
before we upgrade from 3.5.

My testing has turned up two characters that used to work fine with no 
escaping that now give syntax errors without a preceding backslash.  
Those characters are forward slash and apostrophe (single quote).

Is there a canonical list of characters that require backslashes to work 
properly?  I know about some already, such as parens and brackets.

On the same subject, but far less important because with SolrJ I don't 
think I have to worry about it, is there a list of characters that 
require URL encoding (%XX)?

Thanks,
Shawn


Re: Solr4 - new characters requiring backslash escape?

Posted by Chris Hostetter <ho...@fucit.org>.
: Here's the query showing the apostrophe problem, pulled from our search logs:
: q=(  (MEXICO DAY OF THE DEAD   CELEBRATION 'TRADITIONS OF LIFE AND DAY'))
: 
: This is the error msg I can see in my browser when I send that to Solr
: 4.1-SNAPSHOT from 2012/11/26.  I am not doing any testing with the 4.0.0
: release:

Definitely a bug, but it doesn't affect 4.0 -- it's a result of the 
changes made in SOLR-4093...

https://issues.apache.org/jira/browse/SOLR-4121

Thanks for reporting this.


-Hoss

Re: Solr4 - new characters requiring backslash escape?

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/28/2012 10:16 AM, Jack Krupansky wrote:
> Forward slash is now reserved for regular expression terms.
>
> For the full list, see the Javadoc, here:
> http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters 
>
>
> I don't know of any change related to apostrophe. That may be token 
> filter-specific. Maybe related to possessive removal. In what context 
> is it a problem?

Thanks, that javadoc will be very helpful.

Here's the query showing the apostrophe problem, pulled from our search 
logs:
q=(  (MEXICO DAY OF THE DEAD   CELEBRATION 'TRADITIONS OF LIFE AND DAY'))

This is the error msg I can see in my browser when I send that to Solr 
4.1-SNAPSHOT from 2012/11/26.  I am not doing any testing with the 4.0.0 
release:

org.apache.solr.search.SyntaxError: Cannot parse '( (MEXICO DAY OF THE 
DEAD   CELEBRATION 'TRADITIONS OF LIFE AND DAY'))': Encountered " 
<SQUOTED> "\'TRADITIONS OF LIFE AND DAY\' "" at line 1, column 41.
Was expecting one of:
     <AND> ...
     <OR> ...
     <NOT> ...
     "+" ...
     "-" ...
     <BAREOPER> ...
     "(" ...
     ")" ...
     "*" ...
     "^" ...
     <QUOTED> ...
     <TERM> ...
     <FUZZY_SLOP> ...
     <PREFIXTERM> ...
     <WILDTERM> ...
     <REGEXPTERM> ...
     "[" ...
     "{" ...
     <LPARAMS> ...
     <NUMBER> ...

The same query against 3.5.0 works. When I add backslashes to single 
quotes for the query against either version, it works. The escaped query 
has the same numFound on both 3.5 and 4.1 as the unescaped query does 
against 3.5.  Escaped query:

q=(  (MEXICO DAY OF THE DEAD   CELEBRATION \'TRADITIONS OF LIFE AND DAY\'))

Is this a change that's new in 4.1, or is it perhaps a bug?

Thanks,
Shawn


Re: Solr4 - new characters requiring backslash escape?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Forward slash is now reserved for regular expression terms.

For the full list, see the Javadoc, here:
http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

I don't know of any change related to apostrophe. That may be token 
filter-specific. Maybe related to possessive removal. In what context is it 
a problem?

-- Jack Krupansky

-----Original Message----- 
From: Shawn Heisey
Sent: Wednesday, November 28, 2012 11:52 AM
To: solr-user@lucene.apache.org
Subject: Solr4 - new characters requiring backslash escape?

I've been putting a new Solr 4.1 deployment through extensive testing
before we upgrade from 3.5.

My testing has turned up two characters that used to work fine with no
escaping that now give syntax errors without a preceding backslash.
Those characters are forward slash and apostrophe (single quote).

Is there a canonical list of characters that require backslashes to work
properly?  I know about some already, such as parens and brackets.

On the same subject, but far less important because with SolrJ I don't
think I have to worry about it, is there a list of characters that
require URL encoding (%XX)?

Thanks,
Shawn