You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jo...@aol.com on 2012/11/14 06:05:55 UTC
Using CJK analyzer
Hi,
Using Solr 1.2.0, the following works (and I get hits searching on Chinese text):
<fieldType name="text" class="solr.TextField">
<analyzer type="index" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
<analyzer type="query" class="org.apache.lucene.analysis.cjk.CJKAnalyzer” />
</fieldType>
But it won't work using Solr 3.6.1. Any idea what I might be missing?
Yes, I also tried (in Solr 3.6.1):
<!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- normalize width before bigram, as e.g. half-width dakuten combine -->
<filter class="solr.CJKWidthFilterFactory"/>
<!-- for any non-CJK -->
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory"/>
</analyzer>
</fieldType>
and it won't work.
I run it through the analyzer and I see this (I hope the table will show up fine on the mailing list):
Index Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}
position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行
startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Query Analyzer
org.apache.lucene.analysis.cn.ChineseAnalyzer {}
position
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
term text
去
除
商
品
操
作
在
订
购
单
中
留
下
空
白
行
startOffset
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
endOffset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
--MJ