You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rajani Maski <ra...@gmail.com> on 2012/08/08 11:53:58 UTC
Paoding analyzer with solr for chinese
Hi All,
As said in this blog
site<http://java.dzone.com/articles/indexing-chinese-solr> that
paoding
analyzer is much better for chinese text, I was trying to implement it to
get accurate results for chinese text.
I followed the instruction specified in the below sites
Site1<http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
& Site2<http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>
After Indexing, when I search on same field with same text, no search
results(numFound=0)
And luke tool is not showing up any terms for the field that is indexed
with below field type. Can anyone comment on what is going wrong?
*Schema field types for paoding :*
*1) <fieldType name="paoding" class="solr.TextField"
positionIncrementGap="100" >*
* <analyzer>*
* <tokenizer class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>
*
* </analyzer>*
* </fieldType>*
And analaysis page results is :
[image: Inline image 2]
*2)<fieldType name="paoding_chinese" class="solr.TextField">*
* <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
* </analyzer>*
* </fieldType>*
Analysis on the field "paoding_chinese" throws this error
[image: Inline image 3]
Thanks & Regards
Rajani
Re: Paoding analyzer with solr for chinese
Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi Rajani,
I'm not really familiar with this paoding tokenizer, but it seems a bit
old. We are using the CJKBigramFilter (like in the example of Solr 4.0
alpha), which should be equivalent or even better and it works.
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory" />
<filter class="solr.ICUFoldingFilterFactory" />
<filter class="solr.CJKBigramFilterFactory" />
</analyzer>
Uwe
Am 09.08.2012 06:47, schrieb Rajani Maski:
> Hi All,
>
> Any reply on this?
>
>
>
> On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski <rajinimaski@gmail.com
> <ma...@gmail.com>> wrote:
>
> Hi All,
>
> As said in this blog site
> <http://java.dzone.com/articles/indexing-chinese-solr> that paoding
> analyzer is much better for chinese text, I was trying to implement
> it to get accurate results for chinese text.
>
> I followed the instruction specified in the below sites
> Site1
> <http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
> & Site2
> <http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>
>
>
> After Indexing, when I search on same field with same text, no
> search results(numFound=0)
>
> And luke tool is not showing up any terms for the field that is
> indexed with below field type. Can anyone comment on what is going
> wrong?
>
>
>
> *_Schema field types for paoding :_*
>
> *1) <fieldType name="paoding" class="solr.TextField"
> positionIncrementGap="100" >*
> *<analyzer>*
> *<tokenizer
> class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>*
> *</analyzer>*
> *</fieldType>*
>
>
> And analaysis page results is :
> Inline image 2
>
> *2)<fieldType name="paoding_chinese" class="solr.TextField">*
> * <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
> * </analyzer>*
> * </fieldType>*
>
> Analysis on the field "paoding_chinese" throws this error
> Inline image 3
>
>
>
> Thanks & Regards
> Rajani
>
>
>
Re: Paoding analyzer with solr for chinese
Posted by Rajani Maski <ra...@gmail.com>.
Hi All,
Any reply on this?
On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski <ra...@gmail.com> wrote:
> Hi All,
>
> As said in this blog site<http://java.dzone.com/articles/indexing-chinese-solr> that paoding
> analyzer is much better for chinese text, I was trying to implement it to
> get accurate results for chinese text.
>
> I followed the instruction specified in the below sites
> Site1<http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
> & Site2<http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>
>
>
> After Indexing, when I search on same field with same text, no search
> results(numFound=0)
>
> And luke tool is not showing up any terms for the field that is indexed
> with below field type. Can anyone comment on what is going wrong?
>
>
>
> *Schema field types for paoding :*
>
> *1) <fieldType name="paoding" class="solr.TextField"
> positionIncrementGap="100" >*
> * <analyzer>*
> * <tokenizer
> class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>*
> * </analyzer>*
> * </fieldType>*
>
>
> And analaysis page results is :
> [image: Inline image 2]
>
> *2)<fieldType name="paoding_chinese" class="solr.TextField">*
> * <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
> * </analyzer>*
> * </fieldType>*
>
> Analysis on the field "paoding_chinese" throws this error
> [image: Inline image 3]
>
>
>
> Thanks & Regards
> Rajani
>
>
>