You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jo...@aol.com on 2012/11/14 06:05:55 UTC

Using CJK analyzer

Hi,


Using Solr 1.2.0, the following works (and I get hits searching on Chinese text):


<fieldType name="text" class="solr.TextField">
    <analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
    <analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
</fieldType>



But it won't work using Solr 3.6.1.  Any idea what I might be missing?


Yes, I also tried (in Solr 3.6.1):



    <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- normalize width before bigram, as e.g. half-width dakuten combine  -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- for any non-CJK -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CJKBigramFilterFactory"/>
      </analyzer>
    </fieldType>



and it won't work.


I run it through the analyzer and I see this (I hope the table will show up fine on the mailing list):


Index Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}


position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行

startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16


Query Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}


position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行

startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16






--MJ