You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Weiwei Wang <ww...@gmail.com> on 2011/04/24 05:19:17 UTC

ICU Chinese words

hi,all
      I'm working on a Chinese contact search project, I need to transform
the Chinese words to its Pinyin form.

e.g.
 中国--> zhongguo

The problem I encounter is that for some chinese words which have more than
one transforms, like. 贾-> jia, 贾->gu, ...

I already used the ICUTransformFilter(Han->Latin/Names),how could i get all
the transforms instead just one of them?

Thanks~

-- 
王巍巍
Cell: 18911288489
MSN: ww.wang.cs@gmail.com
Blog: http://whisper.eyesay.org
围脖:http://t.sina.com/lolorosa

Re: ICU Chinese words

Posted by Robert Muir <rc...@gmail.com>.
2011/4/23 Weiwei Wang <ww...@gmail.com>:
> hi,all
>      I'm working on a Chinese contact search project, I need to transform
> the Chinese words to its Pinyin form.
>
> e.g.
>  中国--> zhongguo
>
> The problem I encounter is that for some chinese words which have more than
> one transforms, like. 贾-> jia, 贾->gu, ...
>
> I already used the ICUTransformFilter(Han->Latin/Names),how could i get all
> the transforms instead just one of them?
>

Maybe use the unihan database (e.g. generate synonyms or something
from it, or make a special filter) ?

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E8%B4%BE
kMandarin	JIA3 GU3 JIA4

you can download this as a zip file.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org