You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by stockii <st...@shopgate.com> on 2010/06/24 16:04:55 UTC
underscore, comma in terms.prefix
Hello.
this is my filterchain for suggestion with termsComponent:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="([,_])" replacement=" " replace="all" />
<filter class="solr.CommonGramsFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" splitOnCaseChange="1"
splitOnNumerics="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- Ein und Mehrzahl, ü == ue und ue == ü -->
<filter class="solr.SnowballPorterFilterFactory" language="German2" />
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.CommonGramsFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateAll="1"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<!-- <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="false"/> -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
so my question/problem is.
- when i index with this settings i got a underscore ("_") in my index. is
comma replace with underscore ?
- solr import this strin: "Eiseimer COOL mit Greifer" into this -> "cool mit
mit" when i search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in my suggest ....
any idea ?? ! =) thx
--
View this message in context: http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: underscore, comma in terms.prefix
Posted by stockii <st...@shopgate.com>.
okay thx.
WordDelimiterFactory with the option generateNumberParts="0" maked trouble
;-)
--
View this message in context: http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919655.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: underscore, comma in terms.prefix
Posted by Otis Gospodnetic <ot...@yahoo.com>.
stocki,
Solr's Analysis page will tell you what's happening. I can't tell by just looking, though I would first try removing the CommonGramsFF and see if repetition is still happening.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
----- Original Message ----
> From: stockii <st...@shopgate.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, June 24, 2010 10:04:55 AM
> Subject: underscore, comma in terms.prefix
>
>
Hello.
this is my filterchain for suggestion with
> termsComponent:
<fieldType name="textgen" class="solr.TextField"
> positionIncrementGap="100">
<analyzer
> type="index">
<tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>
<filter
> class="solr.PatternReplaceFilterFactory"
> pattern="([,_])" replacement=" " replace="all"
> />
> <filter class="solr.CommonGramsFilterFactory"
> words="stopwords.txt"
ignoreCase="true"/>
> <filter
> class="solr.StandardFilterFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1"
generateNumberParts="0" catenateWords="0"
> splitOnCaseChange="1"
splitOnNumerics="0"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory"
> maxShingleSize="3"
outputUnigrams="true" />
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
<analyzer
> type="query">
<tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>
<!-- Ein und
> Mehrzahl, ü == ue und ue == ü -->
> <filter class="solr.SnowballPorterFilterFactory" language="German2"
> />
<charFilter
> class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
>
<filter
> class="solr.CommonGramsFilterFactory"
> words="stopwords.txt"
ignoreCase="true"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter
> class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
> generateNumberParts="1" catenateAll="1"
splitOnCaseChange="1"/>
> <filter
> class="solr.LowerCaseFilterFactory"/>
> <!-- <filter class="solr.ShingleFilterFactory"
> maxShingleSize="2"
outputUnigrams="false"/> -->
> <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
</fieldType>
so my
> question/problem is.
- when i index with this settings i got a underscore
> ("_") in my index. is
comma replace with underscore ?
- solr import this
> strin: "Eiseimer COOL mit Greifer" into this -> "cool mit
mit" when i
> search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in
> my suggest ....
any idea ?? ! =) thx
--
View this
> message in context:
> href="http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html"
> target=_blank
> >http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent
> from the Solr - User mailing list archive at Nabble.com.