You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Robert Brown <ro...@intelcompute.com> on 2012/01/30 15:02:24 UTC

"sage 200" not matching "... sage 200."

The trailing full-stop above is not being matched when searching for 
"sage 200" for the below field type...

Do I need the WordDelimiterFilterFactory for this to work as expected? 
I don't see any mention of periods being discussed in the docs.


<fieldType name="textgen" class="solr.TextField" 
positionIncrementGap="100">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.SynonymFilterFactory" 
synonyms="textgen-synonyms.txt" ignoreCase="true" expand="true"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.SynonymFilterFactory" 
synonyms="textgen-synonyms.txt" ignoreCase="true" expand="true"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
</fieldType>

Thanks,
Rob


--

IntelCompute
Web Design & Local Online Marketing

http://www.intelcompute.com

Re: "sage 200" not matching "... sage 200."

Posted by Ahmet Arslan <io...@yahoo.com>.

> The trailing full-stop above is not
> being matched when searching for "sage 200" for the below
> field type...
> 
> Do I need the WordDelimiterFilterFactory for this to work as
> expected? I don't see any mention of periods being discussed
> in the docs.
> 
> 
> <fieldType name="textgen" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer type="index">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.SynonymFilterFactory"
> synonyms="textgen-synonyms.txt" ignoreCase="true"
> expand="true"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.SynonymFilterFactory"
> synonyms="textgen-synonyms.txt" ignoreCase="true"
> expand="true"/>
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
> </fieldType>
> 

White space tokenizer leaves periods. Either use StandardTokenizer or include WordDelimeterFilter. 

Analysis page visualizes created tokens, it is useful when testing/understanding tokenizer/filter behavior.