You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Rajani Maski <ra...@gmail.com> on 2012/08/08 11:53:58 UTC

Paoding analyzer with solr for chinese

Hi All,

  As said in this blog
site<http://java.dzone.com/articles/indexing-chinese-solr> that
paoding
analyzer is much better for chinese text, I was trying to implement it to
get accurate results for chinese text.

I followed the instruction specified in the below sites
Site1<http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
&   Site2<http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>


After Indexing, when I search on same field with same text, no search
results(numFound=0)

And luke tool is not showing up any terms for the field that is indexed
with below field type. Can anyone comment on what is going wrong?



*Schema field types for  paoding :*

*1) <fieldType name="paoding" class="solr.TextField"
positionIncrementGap="100" >*
* <analyzer>*
* <tokenizer class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>
*
* </analyzer>*
* </fieldType>*


And analaysis page results is :
[image: Inline image 2]

*2)<fieldType name="paoding_chinese" class="solr.TextField">*
*      <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
*  </analyzer>*
*    </fieldType>*

Analysis on the  field "paoding_chinese" throws this error
[image: Inline image 3]



Thanks & Regards
Rajani

Re: Paoding analyzer with solr for chinese

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.

Hi Rajani,

I'm not really familiar with this paoding tokenizer, but it seems a bit 
old. We are using the CJKBigramFilter (like in the example of Solr 4.0 
alpha), which should be equivalent or even better and it works.

<analyzer>
    <tokenizer class="solr.ICUTokenizerFactory" />
    <filter class="solr.WordDelimiterFilterFactory" />
    <filter class="solr.ICUFoldingFilterFactory" />
    <filter class="solr.CJKBigramFilterFactory" />
</analyzer>

Uwe



Am 09.08.2012 06:47, schrieb Rajani Maski:
> Hi All,
>
>    Any reply on this?
>
>
>
> On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski <rajinimaski@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi All,
>
>        As said in this blog site
>     <http://java.dzone.com/articles/indexing-chinese-solr> that paoding
>     analyzer is much better for chinese text, I was trying to implement
>     it to get accurate results for chinese text.
>
>     I followed the instruction specified in the below sites
>     Site1
>     <http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
>     & Site2
>     <http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>
>
>
>     After Indexing, when I search on same field with same text, no
>     search results(numFound=0)
>
>     And luke tool is not showing up any terms for the field that is
>     indexed with below field type. Can anyone comment on what is going
>     wrong?
>
>
>
>     *_Schema field types for  paoding :_*
>
>     *1) <fieldType name="paoding" class="solr.TextField"
>     positionIncrementGap="100" >*
>     *<analyzer>*
>     *<tokenizer
>     class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>*
>     *</analyzer>*
>     *</fieldType>*
>
>
>     And analaysis page results is :
>     Inline image 2
>
>     *2)<fieldType name="paoding_chinese" class="solr.TextField">*
>     *      <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
>     * </analyzer>*
>     *    </fieldType>*
>
>     Analysis on the  field "paoding_chinese" throws this error
>     Inline image 3
>
>
>
>     Thanks & Regards
>     Rajani
>
>
>

Re: Paoding analyzer with solr for chinese

Posted by Rajani Maski <ra...@gmail.com>.

Hi All,

  Any reply on this?



On Wed, Aug 8, 2012 at 3:23 PM, Rajani Maski <ra...@gmail.com> wrote:

> Hi All,
>
>   As said in this blog site<http://java.dzone.com/articles/indexing-chinese-solr> that paoding
> analyzer is much better for chinese text, I was trying to implement it to
> get accurate results for chinese text.
>
> I followed the instruction specified in the below sites
> Site1<http://androidyou.blogspot.hk/2010/05/chinese-tokenizerlibrary-paoding-with.html>
> &   Site2<http://www.opensourceconnections.com/2011/12/23/indexing-chinese-in-solr/>
>
>
> After Indexing, when I search on same field with same text, no search
> results(numFound=0)
>
> And luke tool is not showing up any terms for the field that is indexed
> with below field type. Can anyone comment on what is going wrong?
>
>
>
> *Schema field types for  paoding :*
>
> *1) <fieldType name="paoding" class="solr.TextField"
> positionIncrementGap="100" >*
> * <analyzer>*
> * <tokenizer
> class="test.solr.PaodingTokerFactory.PaoDingTokenizerFactory"/>*
> * </analyzer>*
> * </fieldType>*
>
>
> And analaysis page results is :
> [image: Inline image 2]
>
> *2)<fieldType name="paoding_chinese" class="solr.TextField">*
> *      <analyzer class="net.paoding.analysis.analyzer.PaodingAnalyzer">*
> *  </analyzer>*
> *    </fieldType>*
>
> Analysis on the  field "paoding_chinese" throws this error
> [image: Inline image 3]
>
>
>
> Thanks & Regards
> Rajani
>
>
>