You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Zsolt Koppany <zk...@intland.com> on 2005/06/13 18:01:33 UTC

How to index Chinese text?

Our application works with lucene-1.4.3 stable even for German text but we
have problems with Chinese text. Which analyzer should we use to index
Chinese text?

Zsolt



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: How to index Chinese text?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jun 13, 2005, at 12:01 PM, Zsolt Koppany wrote:

> Our application works with lucene-1.4.3 stable even for German text  
> but we
> have problems with Chinese text. Which analyzer should we use to index
> Chinese text?

This question is best posted to java-user, not java-dev, but I'll  
reply here for now.

The answer is that "it depends" on what you want to do.   
StandardAnalyzer will tokenize CJK characters individually.  In the  
contrib area of the Subversion repository under "analyzers", there is  
a ChineseAnalyzer and a CJKAnalyzer.

     Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org