You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Eric Chow <er...@gmail.com> on 2005/04/11 12:01:50 UTC

Urgent, please help, index/search in UTF-8 ???

Hello,

I am a beginner in using Lucene.

My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.

I am now developing a web-based search engine. I use Lucene to create
index for those files and search it in web. The charset of the web
page is UTF-8, but it cannot search anything.

I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
StandardAnalyser, SimpleAnalyser), still failed.

Finally, I tested to use original charset, for example, the Chinese
contents I used BIG5, and I can search it very well. For those
English, of couse, no problem.

But I can't use UTF-8 as the charset for documents. Any suggest and examples ?

Best regards,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org