You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cao yuzhong <ca...@hotmail.com> on 2005/03/06 10:30:19 UTC

My nutch does not support chinese keywords.

hi,

My nutch does not support chinese keywords.

I have tried to install nutch-0.5 and nutch-0.6 on
windows xp professional(simplified chinese version) and Fedora3.

If using chinese keywords,all the chinese characters in the result
pages can not be displayed correctly.Just like this:

±±??o???o?ìì′ó?§????′| 
... ú?ò?÷àà???D????£??????a・¢?¢????3é1?×aè?oíí?1?£??? ... ù
′??D??°ì1?êò?¢?????a・¢2??¢3 ... 
http://cse.buaa.edu.cn/ (网页快照) (评分详解) (anchors) 


Are there some hints to deal with this problem?

thx!

Cao Yuzhong from BeiJing China
2005-03-06



Re: My nutch does not support chinese keywords.

Posted by Feng Zhou <fe...@gmail.com>.
I've managed to get the dev verison in subversion to search Chinese
pages. In my case, the only problem turned out to be the Web
container's. I use Tomcat 5.5 and here's my experience. I had to
change the Web connector config line in server.xml to this:

    <Connector port="8080"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" 
               URIEncoding="UTF-8" useBodyEncodingForURI="true"/>

The last two tags are what's added. Essentially it forces Tomcat to
parse the GET parameters as UTF-8 encoded. The default encoding is
ISO8859-1, which unfortunately would not work for any non-western
languages. So this actually apply to many other languages as well.

Read http://issues.apache.org/bugzilla/show_bug.cgi?id=22666 for
details (scroll down to the end).

- Feng Zhou

On Sun, 06 Mar 2005 09:30:19 +0000, cao yuzhong <ca...@hotmail.com> wrote:
> hi,
> 
> My nutch does not support chinese keywords.
> 
> I have tried to install nutch-0.5 and nutch-0.6 on
> windows xp professional(simplified chinese version) and Fedora3.
> 
> If using chinese keywords,all the chinese characters in the result
> pages can not be displayed correctly.Just like this:
> 
> ±±??o???o?ìì′ó?§????′|
> ... ú?ò?÷àà???D????£??????a·¢?¢????3é1?×aè?oíí?1?£??? ... ù
> ′??D??°ì1?êò?¢?????a·¢2??¢3 ...
> http://cse.buaa.edu.cn/ (网页快照) (评分详解) (anchors)
> 
> Are there some hints to deal with this problem?
> 
> thx!
> 
> Cao Yuzhong from BeiJing China
> 2005-03-06
> 
> 


-- 
- Feng