You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by KLessou <kl...@gmail.com> on 2008/10/01 15:48:28 UTC
termFreq always = 1 ?
Hi,
I want to index a list of keywords.
When I search "k1_en:men", I find a lot of documents like that :
DocA :
(k1_en = man;men;Men;business... termFreq=2)
DocB :
(k1_en = man;Men;business... termFreq=1)
DocC :
...
DocD :
...
DocE :
...
But I don't want to have a different termFreq for DocA & DocB.
I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me :-/
<fieldtype name="keywords_en" class="solr.TextField">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
<!--filter class="solr.SnowballPorterFilterFactory"
language="English" /-->
<!--filter class="solr.PhoneticFilterFactory"
encoder="DoubleMetaphone" inject="true"/-->
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory" pattern=";"
/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
<!--filter class="solr.SnowballPorterFilterFactory"
language="English" /-->
<!--filter class="solr.PhoneticFilterFactory"
encoder="DoubleMetaphone" inject="true"/-->
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0"
generateNumberParts="0"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldtype>
...
<field name="k1_en" type="keywords_en" indexed="true" stored="true"
required="false" />
If you have any idea, thx in advance.
--
~~~~~
| klessou |
~~~~~