You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by j <jt...@gmail.com> on 2010/08/05 15:50:16 UTC

word delimiter

I have UPPER12-lower and would like to be able to find it with queries
"UPPER" or "lower". What should break this up for the index? A
tokenizer or a filter such as WordDelimiterFilterFactory?

I have tried various combinations of parameters to
WordDelimiterFilterFactory and cant get it to split properly. Here are
the results from using standard tokenizer followed directly by the
WordDelimiterFilterFactory markup below (from analysis.jsp):

1                         | 2
UPPER12-lower | lower
-----------------------
UPPER              |
-----------------------
12                       |


<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>

Re: word delimiter

Posted by Ahmet Arslan <io...@yahoo.com>.

> I have UPPER12-lower and would like
> to be able to find it with queries
> "UPPER" or "lower". What should break this up for the
> index? A
> tokenizer or a filter such as WordDelimiterFilterFactory?

If all thats you want just LowerCaseTokenizer will be enough.