You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2015/06/25 11:02:00 UTC

Tokenizer and Filter Factory to index Chinese characters

Hi,

Does anyone knows what is the correct replacement for these 2 tokenizer and
filter factory to index chinese into Solr?
- SmartChineseSentenceTokenizerFactory
- SmartChineseWordTokenFilterFactory

I understand that these 2 tokenizer and filter factory are already
deprecated in Solr 5.1, but I can't seem to find the correct replacement.


<fieldType name="text_smartcn" class="solr.TextField"
positionIncrementGap="0">
          <analyzer type="index">
            <tokenizer
class="org.apache.lucene.analysis.cn.smart.SmartChineseSentenceTokenizerFactory"/>
            <filter
class="org.apache.lucene.analysis.cn.smart.SmartChineseWordTokenFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <tokenizer
class="org.apache.lucene.analysis.cn.smart.SmartChineseSentenceTokenizerFactory"/>
            <filter
class="org.apache.lucene.analysis.cn.smart.SmartChineseWordTokenFilterFactory"/>
          </analyzer>
</fieldType>

Thank you.


Regards,
Edwin