You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Eric Chow <er...@gmail.com> on 2005/04/11 12:01:50 UTC
Urgent, please help, index/search in UTF-8 ???
Hello,
I am a beginner in using Lucene.
My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.
I am now developing a web-based search engine. I use Lucene to create
index for those files and search it in web. The charset of the web
page is UTF-8, but it cannot search anything.
I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
StandardAnalyser, SimpleAnalyser), still failed.
Finally, I tested to use original charset, for example, the Chinese
contents I used BIG5, and I can search it very well. For those
English, of couse, no problem.
But I can't use UTF-8 as the charset for documents. Any suggest and examples ?
Best regards,
Eric
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org