You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by stockii <st...@shopgate.com> on 2010/06/24 16:04:55 UTC

underscore, comma in terms.prefix

Hello.

this is my filterchain for suggestion with termsComponent:

<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
		
		<filter class="solr.PatternReplaceFilterFactory"
                pattern="([,_])" replacement=" " replace="all" />
        
		<filter class="solr.CommonGramsFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
		<filter class="solr.StandardFilterFactory"/>
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="0" catenateWords="0" splitOnCaseChange="1"
splitOnNumerics="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.ShingleFilterFactory" maxShingleSize="3"
outputUnigrams="true" />
		<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
		
		<!-- Ein und Mehrzahl, ü == ue und ue == ü -->
		<filter class="solr.SnowballPorterFilterFactory" language="German2" />
		<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
		
        <filter class="solr.CommonGramsFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
		<filter class="solr.StandardFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateAll="1"
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
		<!-- <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
outputUnigrams="false"/> -->
		<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


so my question/problem is.

- when i index with this settings i got a underscore ("_") in my index. is
comma replace with underscore ? 
- solr import this strin: "Eiseimer COOL mit Greifer" into this -> "cool mit
mit" when i search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in my suggest ....

any idea ?? ! =) thx



-- 
View this message in context: http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: underscore, comma in terms.prefix

Posted by stockii <st...@shopgate.com>.

okay thx. 

WordDelimiterFactory with the option generateNumberParts="0" maked trouble
;-)
-- 
View this message in context: http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919655.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: underscore, comma in terms.prefix

Posted by Otis Gospodnetic <ot...@yahoo.com>.

stocki,

Solr's Analysis page will tell you what's happening.  I can't tell by just looking, though I would first try removing the CommonGramsFF and see if repetition is still happening.

 

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: stockii <st...@shopgate.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, June 24, 2010 10:04:55 AM
> Subject: underscore, comma in terms.prefix
> 
> 
Hello.

this is my filterchain for suggestion with 
> termsComponent:

<fieldType name="textgen" class="solr.TextField" 
> positionIncrementGap="100">
      <analyzer 
> type="index">
        <tokenizer 
> class="solr.WhitespaceTokenizerFactory"/>
    
>     
        <filter 
> class="solr.PatternReplaceFilterFactory"
          
>       pattern="([,_])" replacement=" " replace="all" 
> />
        
        
> <filter class="solr.CommonGramsFilterFactory" 
> words="stopwords.txt"
ignoreCase="true"/>
    
>     <filter 
> class="solr.StandardFilterFactory"/>
        
> <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1"
generateNumberParts="0" catenateWords="0" 
> splitOnCaseChange="1"
splitOnNumerics="0"/>
        
> <filter class="solr.LowerCaseFilterFactory"/>
    
>     <filter class="solr.ShingleFilterFactory" 
> maxShingleSize="3"
outputUnigrams="true" />
    
>     <filter 
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
      
> </analyzer>
      <analyzer 
> type="query">
        <tokenizer 
> class="solr.WhitespaceTokenizerFactory"/>
    
>     
        <!-- Ein und 
> Mehrzahl, ü == ue und ue == ü -->
        
> <filter class="solr.SnowballPorterFilterFactory" language="German2" 
> />
        <charFilter 
> class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
    
>     
        <filter 
> class="solr.CommonGramsFilterFactory" 
> words="stopwords.txt"
ignoreCase="true"/>
    
>     <filter class="solr.StandardFilterFactory"/>
  
>       <filter 
> class="solr.WordDelimiterFilterFactory"
generateWordParts="1" 
> generateNumberParts="1" catenateAll="1"
splitOnCaseChange="1"/>
  
>       <filter 
> class="solr.LowerCaseFilterFactory"/>
    
>     <!-- <filter class="solr.ShingleFilterFactory" 
> maxShingleSize="2"
outputUnigrams="false"/> -->
    
>     <filter 
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
      
> </analyzer>
    </fieldType>


so my 
> question/problem is.

- when i index with this settings i got a underscore 
> ("_") in my index. is
comma replace with underscore ? 
- solr import this 
> strin: "Eiseimer COOL mit Greifer" into this -> "cool mit
mit" when i 
> search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in 
> my suggest ....

any idea ?? ! =) thx



-- 
View this 
> message in context: 
> href="http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html" 
> target=_blank 
> >http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.