You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by neosky <ne...@yahoo.com> on 2012/04/10 06:52:57 UTC

which approach is correct?

Here are my fields
<field name=id>101</field><field name=sequence>NGHGJGKGKLHJFKGJGKGK</field>

the sequence field is from 300 bytes to 56K bytes, no spaces
I want to ngram from 3 to 8
NGH GHG　HGJ ...
NGHG GHGJ HGJG ...
...
<fieldType name="nGram1" class="solr.TextField" 
            positionIncrementGap="100" stored="false" multiValued="true"> 
            <analyzer type="index"> 
                <tokenizer class="solr.StandardTokenizerFactory"
maxTokenLength="56000" /> 
                <filter class="solr.NGramFilterFactory" minGramSize="3" 
                          maxGramSize="8"/> 
            </analyzer> 
            <analyzer type="query"> 
                <tokenizer class="solr.StandardTokenizerFactory"/> 
        </analyzer> 
    </fieldType> 

<fieldType name="nGram2" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3"
maxGramSize="8" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
</analyzer>
</fieldType>




--
View this message in context: http://lucene.472066.n3.nabble.com/which-approach-is-correct-tp3898711p3898711.html
Sent from the Solr - User mailing list archive at Nabble.com.