You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]" <ti...@nasa.gov> on 2009/07/29 20:46:38 UTC

query and analyzers

Hi,
What analyzer, tokenizer, filter factory would I need to use to get wildcard matching to match where:
Value:
XYZ123
Query:
XYZ1*

I have been messing with solr.WordDelimiterFilterFactory splitOnNumerics and oreserveOriginal in both the analyzer and the query.  I also noticed it is different when I use quotes in the query - phrase search.  Unfortunately, I'm missing something as I can't get it to work.

Tim



RE: query and analyzers

Posted by "Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]" <ti...@nasa.gov>.
That did it, thanks!

I thought that was how it should work, but I guess somehow I got out of sync or something at one point which led me to dive deeper into it than I needed to.

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Wednesday, July 29, 2009 12:52 PM
To: solr-user@lucene.apache.org
Subject: RE: query and analyzers


In order to match (query) XYZ1* to (document) XYZ123 you do not need WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one token. And WhitespaceTokenizer is one of them.

As I see from the fieldType named text_ws, you want to use WhitespaceTokenizerFactory
and there is no LowercaseFilter in it. So there is no problem.
Just remove the WordDelimiterFilterFactory (both query and index) and it should work.
 
Ahmet


      

RE: query and analyzers

Posted by AHMET ARSLAN <io...@yahoo.com>.
In order to match (query) XYZ1* to (document) XYZ123 you do not need WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one token. And WhitespaceTokenizer is one of them.

As I see from the fieldType named text_ws, you want to use WhitespaceTokenizerFactory
and there is no LowercaseFilter in it. So there is no problem.
Just remove the WordDelimiterFilterFactory (both query and index) and it should work.
 
Ahmet


      

RE: query and analyzers

Posted by "Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]" <ti...@nasa.gov>.
This was the definition I was last working with (I've been playing with setting the various parameters).

	<fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
			generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1"
			splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" 
			generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="1"
			splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
       </analyzer>
     </fieldType>

-----Original Message-----
From: AHMET ARSLAN [mailto:iorixxx@yahoo.com] 
Sent: Wednesday, July 29, 2009 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: query and analyzers


> What analyzer, tokenizer, filter factory would I need to
> use to get wildcard matching to match where:
> Value:
> XYZ123
> Query:
> XYZ1*

StandardAnalyzer, WhitespaceAnalyzer.
 
> I have been messing with solr.WordDelimiterFilterFactory
> splitOnNumerics and oreserveOriginal in both the analyzer
> and the query.  I also noticed it is different when I
> use quotes in the query - phrase search. 
> Unfortunately, I'm missing something as I can't get it to
> work.

But i think your problem is not the analyzer. I guess in your analyzer there is lowercase filter and wildcard queries are not analyzed.
Try querying xyz1* 


      

Re: query and analyzers

Posted by AHMET ARSLAN <io...@yahoo.com>.
> What analyzer, tokenizer, filter factory would I need to
> use to get wildcard matching to match where:
> Value:
> XYZ123
> Query:
> XYZ1*

StandardAnalyzer, WhitespaceAnalyzer.
 
> I have been messing with solr.WordDelimiterFilterFactory
> splitOnNumerics and oreserveOriginal in both the analyzer
> and the query.  I also noticed it is different when I
> use quotes in the query - phrase search. 
> Unfortunately, I'm missing something as I can't get it to
> work.

But i think your problem is not the analyzer. I guess in your analyzer there is lowercase filter and wildcard queries are not analyzed.
Try querying xyz1*