You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter 4U <pe...@hotmail.com> on 2009/12/03 14:29:45 UTC

Facet query with special characters

Hello,
 
I've encountered some strange behaviour in Solr facet querying, and I've not been able to find anything on this on the web.
Perhaps someone can shed some light on this?
 
The problem:
When performing a facet query where part of the value portion has a special character (a minus sign in this case), the query returns zero results unless I put a wildcard (*) at the end.


 
Here is my query:
 
This produces zero 'numFound':
http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature
 
This produces 28 'numFound':
http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain*)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature


(Note: all hit results are for <host>pds-comp.domain</host> - there are no other characters in the resulting field values)


I've tried escaping the minus sign in various ways, encoding etc., but nothing seems to work.
Can anyone help?
 
Many thanks,
Peter

 		 	   		  
_________________________________________________________________
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

Re: Facet query with special characters

Posted by Lance Norskog <go...@gmail.com>.
Backslash is the escape character for colon, hyphen, parentheses,
brackets, curly braces, tilde, caret. (Did I miss any?)

On Thu, Dec 3, 2009 at 7:59 AM, Peter 4U <pe...@hotmail.com> wrote:
>
> Hello Solr Forum,
>
>
>
> I believe I have found a solution (workaround?) for performing an explicit (non-wildcarded) field query with values that contain special (escaped) characters.
>
>
>
> Instead of:
>
>  field:"value-with-escape-chars"
>
> change this to:
>
>  field:["value-with-escape-chars" TO "value-with-escape-chars"]
>
>
>
> (Note that for SolrJ, use QueryParser.escape(), to ultimately turn this into:  field:[\"value\-with\-escape\-chars\" TO \"value\-with\-escape\-chars\"])
>
>
>
> If the value being queried has no special characters (e.g. host:localhost), the above is not necessary, which leads me to believe this more of a workaround than the 'supported way'. Please do correct me/clarify if you know differently, or know of a better/more efficient method.
>
>
>
> In early tests with 200,000+ hits, there appears no performance hit for using the range form. Not sure if this affects performance for millions+ hits.
>
>
>
> Thanks,
>
> Peter
>
>
>
>
>
>> From: peter4u@hotmail.com
>> To: solr-user@lucene.apache.org
>> Subject: Facet query with special characters
>> Date: Thu, 3 Dec 2009 13:29:45 +0000
>>
>>
>> Hello,
>>
>> I've encountered some strange behaviour in Solr facet querying, and I've not been able to find anything on this on the web.
>> Perhaps someone can shed some light on this?
>>
>> The problem:
>> When performing a facet query where part of the value portion has a special character (a minus sign in this case), the query returns zero results unless I put a wildcard (*) at the end.
>>
>>
>>
>> Here is my query:
>>
>> This produces zero 'numFound':
>> http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature
>>
>> This produces 28 'numFound':
>> http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain*)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature
>>
>>
>> (Note: all hit results are for <host>pds-comp.domain</host> - there are no other characters in the resulting field values)
>>
>>
>> I've tried escaping the minus sign in various ways, encoding etc., but nothing seems to work.
>> Can anyone help?
>>
>> Many thanks,
>> Peter
>>
>>
>> _________________________________________________________________
>> Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
>> http://clk.atdmt.com/UKM/go/186394592/direct/01/
>
> _________________________________________________________________
> Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
> http://clk.atdmt.com/UKM/go/186394592/direct/01/



-- 
Lance Norskog
goksron@gmail.com

RE: Facet query with special characters

Posted by Peter 4U <pe...@hotmail.com>.
Hello Solr Forum,

 

I believe I have found a solution (workaround?) for performing an explicit (non-wildcarded) field query with values that contain special (escaped) characters.

 

Instead of:

  field:"value-with-escape-chars"

change this to:

  field:["value-with-escape-chars" TO "value-with-escape-chars"]

 

(Note that for SolrJ, use QueryParser.escape(), to ultimately turn this into:  field:[\"value\-with\-escape\-chars\" TO \"value\-with\-escape\-chars\"])

 

If the value being queried has no special characters (e.g. host:localhost), the above is not necessary, which leads me to believe this more of a workaround than the 'supported way'. Please do correct me/clarify if you know differently, or know of a better/more efficient method.

 

In early tests with 200,000+ hits, there appears no performance hit for using the range form. Not sure if this affects performance for millions+ hits.

 

Thanks,

Peter

 


 
> From: peter4u@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: Facet query with special characters
> Date: Thu, 3 Dec 2009 13:29:45 +0000
> 
> 
> Hello,
> 
> I've encountered some strange behaviour in Solr facet querying, and I've not been able to find anything on this on the web.
> Perhaps someone can shed some light on this?
> 
> The problem:
> When performing a facet query where part of the value portion has a special character (a minus sign in this case), the query returns zero results unless I put a wildcard (*) at the end.
> 
> 
> 
> Here is my query:
> 
> This produces zero 'numFound':
> http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature
> 
> This produces 28 'numFound':
> http://localhost:8983/solr/select/?wt=xml&indent=on&rows=20&q=((signature:3083 AND host:pds-comp.domain*)) AND _time:[091119124039 TO 091203124039]&facet=true&facet.field=host&facet.field=sourcetype&facet.field=user&facet.field=signature
> 
> 
> (Note: all hit results are for <host>pds-comp.domain</host> - there are no other characters in the resulting field values)
> 
> 
> I've tried escaping the minus sign in various ways, encoding etc., but nothing seems to work.
> Can anyone help?
> 
> Many thanks,
> Peter
> 
> 
> _________________________________________________________________
> Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
> http://clk.atdmt.com/UKM/go/186394592/direct/01/
 		 	   		  
_________________________________________________________________
Add your Gmail and Yahoo! Mail email accounts into Hotmail - it's easy
http://clk.atdmt.com/UKM/go/186394592/direct/01/

RE: Facet query with special characters

Posted by Peter 4U <pe...@hotmail.com>.
Hi,

 

Thanks for your help and answers. I believe I have isolated the issue, and yes, it was 'schema/write'-related.

 

Basically, the issue was this:

All indexing is performed via solrj objects (to an EmbeddedSolrServer instance), and this was ported over from 'raw' Lucene java indexing code. When I moved over to SolrJ, I hadn't realized that the schema.xml file will then affect all writes for the given type. Once I sorted out my schema properly, and reindexed - queries started behaving as expected.

 

Thank you very much for your excellent insight - I'm quite new to Solr, so it's really great to have an expert show me the err of my ways. I had only recently discovered the power of debugQuery=true - awesomely good!

 

Many thanks again,

Peter

 

 
> Date: Tue, 8 Dec 2009 09:35:31 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: RE: Facet query with special characters
> 
> 
> : Note that I am (supposed to be) indexing/searching without analysis 
> : tokenization (if that's the correct term) - i.e. field values like 
> : 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 
> : 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).
> ...
> : What would be your opinion on the best way to index/analyze/not-analyze such fields?
> 
> a whitespace tokenizer is probeably the best bet, but in order to be 
> certain what's going on, you would need to look at a few things (and if 
> you wanted help from other people, you would need to post those things) 
> that i mentioned before....
> 
> : > check your analysis configuration for this fieldtype, in particular look 
> : > at what debugQuery produces for your parsed query, and look at what 
> : > analysis.jsp says it will do at query time with the input string 
> : > "pds-comp.domain" ... because it sounds like you have a disconnect between 
> : > how the text is indexed and how it is searched. adding a * to your 
> 
> ...so what does your schema look like, what is the outputfrom debugQuery, 
> what is the output from analysis.jsp, etc...
> 
> -Hoss
> 
 		 	   		  
_________________________________________________________________
Have more than one Hotmail account? Link them together to easily access both
 http://clk.atdmt.com/UKM/go/186394591/direct/01/

RE: Facet query with special characters

Posted by Chris Hostetter <ho...@fucit.org>.
: Note that I am (supposed to be) indexing/searching without analysis 
: tokenization (if that's the correct term) - i.e. field values like 
: 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 
: 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).
	...
: What would be your opinion on the best way to index/analyze/not-analyze such fields?

a whitespace tokenizer is probeably the best bet, but in order to be 
certain what's going on, you would need to look at a few things (and if 
you wanted help from other people, you would need to post those things) 
that i mentioned before....

: > check your analysis configuration for this fieldtype, in particular look 
: > at what debugQuery produces for your parsed query, and look at what 
: > analysis.jsp says it will do at query time with the input string 
: > "pds-comp.domain" ... because it sounds like you have a disconnect between 
: > how the text is indexed and how it is searched. adding a * to your 

...so what does your schema look like, what is the outputfrom debugQuery, 
what is the output from analysis.jsp, etc...

-Hoss


RE: Facet query with special characters

Posted by Peter 4U <pe...@hotmail.com>.
Hello Hoss,

 

Many thanks for your answer.

That's very interesting.

So, are you saying this is an issue on the index side, rather than the query side?

Note that I am (supposed to be) indexing/searching without analysis tokenization (if that's the correct term) - i.e. field values like 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype).

 

What would be your opinion on the best way to index/analyze/not-analyze such fields?

 

Thanks!

Peter


 
> Date: Mon, 7 Dec 2009 15:30:47 -0800
> From: hossman_lucene@fucit.org
> To: solr-user@lucene.apache.org
> Subject: Re: Facet query with special characters
> 
> 
> 
> : When performing a facet query where part of the value portion has a 
> : special character (a minus sign in this case), the query returns zero 
> : results unless I put a wildcard (*) at the end.
> 
> check your analysis configuration for this fieldtype, in particular look 
> at what debugQuery produces for your parsed query, and look at what 
> analysis.jsp says it will do at query time with the input string 
> "pds-comp.domain" ... because it sounds like you have a disconnect between 
> how the text is indexed and how it is searched. adding a * to your 
> input query forces it to make a WildcardQuery which doesn't use analysis, 
> so you get a match on the literal token.
> 
> in short: i suspect your problem has nothing to do with query string 
> escaping, and everything to do with field tokenization.
> 
> 
> -Hoss
> 
 		 	   		  
_________________________________________________________________
View your other email accounts from your Hotmail inbox. Add them now.
http://clk.atdmt.com/UKM/go/186394592/direct/01/

Re: Facet query with special characters

Posted by Chris Hostetter <ho...@fucit.org>.

: When performing a facet query where part of the value portion has a 
: special character (a minus sign in this case), the query returns zero 
: results unless I put a wildcard (*) at the end.

check your analysis configuration for this fieldtype, in particular look 
at what debugQuery produces for your parsed query, and look at what 
analysis.jsp says it will do at query time with the input string 
"pds-comp.domain" ... because it sounds like you have a disconnect between 
how the text is indexed and how it is searched.  adding a * to your 
input query forces it to make a WildcardQuery which doesn't use analysis, 
so you get a match on the literal token.

in short: i suspect your problem has nothing to do with query string 
escaping, and everything to do with field tokenization.


-Hoss