You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stephen Powis <st...@pardot.com> on 2010/11/04 06:44:11 UTC

Problem escaping question marks

I'm having difficulty properly escaping ? in my search queries.  It seems as
tho it matches any character.

Some info, a simplified schema and query to explain the issue I'm having.
I'm currently running solr1.4.1

Schema:

<field name="id" type="sint" indexed="true" stored="true" required="true" />
<field name="first_name" type="string" indexed="true" stored="true"
required="false" />

I want to return any first name with a Question Mark in it
Query: first_name: *\?*

Returns all documents with any character in it.

Can anyone lend a hand?
Thanks!
Stephen

Re: Problem escaping question marks

Posted by Jean-Sebastien Vachon <js...@videotron.ca>.
Have you tried encoding it with %3F?

firstname:*%3F*

On 2010-11-04, at 1:44 AM, Stephen Powis wrote:

> I'm having difficulty properly escaping ? in my search queries.  It seems as
> tho it matches any character.
> 
> Some info, a simplified schema and query to explain the issue I'm having.
> I'm currently running solr1.4.1
> 
> Schema:
> 
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
> <field name="first_name" type="string" indexed="true" stored="true"
> required="false" />
> 
> I want to return any first name with a Question Mark in it
> Query: first_name: *\?*
> 
> Returns all documents with any character in it.
> 
> Can anyone lend a hand?
> Thanks!
> Stephen


Re: Problem escaping question marks

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Wildcard queries, especially a wildcard query with a wildcard both 
_before_ and _after_, are going to be fairly slow for Solr to process, 
anyhow. (In fact, for some reason I thought wildcards weren't even 
supported both before and after, just one or the other).

Still, it's a bug in lucene, it ought not to do that, true.

But there may be a better design to handle your actual use case with 
much better performance anyhow. Based around doing something at indexing 
time to tokenize in a different field on individual letters (if perhaps 
you frequently want to search on arbitrary individual characters), or to 
simply index a "1" or "0" in a field depending on whether it includes a 
question mark if you specifically want to search all the time on 
question marks and don't care about other letters. Or some kind of more 
complex ngram'ing, if you want to be able to search on all sorts of 
sub-strings, efficiently. The trade-off will be disk space for 
performance... but if you start to have a lot of records, that 
wildcard-on-both-sides thing will have unacceptable performance, I predict.

Jonathan

Stephen Powis wrote:
> Looking at the JIRA issue, looks like there's been a new patch related to
> this.  This is good news!  We've re-written a portion of our web app to use
> Solr instead of mysql.  This part of our app allows clients to construct
> rules to match data within their account, and automatically apply actions to
> those matched data points.  So far our testing and then rollout has been
> smooth, until we encountered the above rule/query.  I guess I assumed since
> these metacharacters were escaped that they would be parsed correctly under
> any type of query.
>
> What is the likelihood of this being included in the next release/bug fix
> version of Solr?  Are there docs available online with basic information
> about rolling our own build of Solr that includes this patch?
>
> I appreciate your help!
> Thanks!
> Stephen
>
>
> On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir <rc...@gmail.com> wrote:
>
>   
>> On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis <st...@pardot.com>
>> wrote:
>>     
>>> I want to return any first name with a Question Mark in it
>>> Query: first_name: *\?*
>>>
>>>       
>> There is no way to escape the metacharacters * or ? for a wildcard
>> query (regardless of queryparser, even if you write your own).
>> See https://issues.apache.org/jira/browse/LUCENE-588
>>
>> Its something we could fix, but in all honesty it seems one reason it
>> isn't fixed is because the bug is so old, yet there hasn't really been
>> any indication of demand for such a thing...
>>
>>     
>
>   

Re: Problem escaping question marks

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Nov 4, 2010 at 4:58 PM, Stephen Powis <st...@pardot.com> wrote:
> What is the likelihood of this being included in the next release/bug fix
> version of Solr?

In this case, not likely. It will have to wait for Solr 4.0

> Are there docs available online with basic information
> about rolling our own build of Solr that includes this patch?

you can checkout trunk with 'svn checkout
http://svn.apache.org/repos/asf/lucene/dev/trunk' and apply the patch
with 'patch -p0 < foo.patch'

Re: Problem escaping question marks

Posted by Stephen Powis <st...@pardot.com>.
Looking at the JIRA issue, looks like there's been a new patch related to
this.  This is good news!  We've re-written a portion of our web app to use
Solr instead of mysql.  This part of our app allows clients to construct
rules to match data within their account, and automatically apply actions to
those matched data points.  So far our testing and then rollout has been
smooth, until we encountered the above rule/query.  I guess I assumed since
these metacharacters were escaped that they would be parsed correctly under
any type of query.

What is the likelihood of this being included in the next release/bug fix
version of Solr?  Are there docs available online with basic information
about rolling our own build of Solr that includes this patch?

I appreciate your help!
Thanks!
Stephen


On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir <rc...@gmail.com> wrote:

> On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis <st...@pardot.com>
> wrote:
> > I want to return any first name with a Question Mark in it
> > Query: first_name: *\?*
> >
>
> There is no way to escape the metacharacters * or ? for a wildcard
> query (regardless of queryparser, even if you write your own).
> See https://issues.apache.org/jira/browse/LUCENE-588
>
> Its something we could fix, but in all honesty it seems one reason it
> isn't fixed is because the bug is so old, yet there hasn't really been
> any indication of demand for such a thing...
>

Re: Problem escaping question marks

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis <st...@pardot.com> wrote:
> I want to return any first name with a Question Mark in it
> Query: first_name: *\?*
>

There is no way to escape the metacharacters * or ? for a wildcard
query (regardless of queryparser, even if you write your own).
See https://issues.apache.org/jira/browse/LUCENE-588

Its something we could fix, but in all honesty it seems one reason it
isn't fixed is because the bug is so old, yet there hasn't really been
any indication of demand for such a thing...