You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cao yuzhong <ca...@hotmail.com> on 2005/03/06 10:30:19 UTC
My nutch does not support chinese keywords.
hi,
My nutch does not support chinese keywords.
I have tried to install nutch-0.5 and nutch-0.6 on
windows xp professional(simplified chinese version) and Fedora3.
If using chinese keywords,all the chinese characters in the result
pages can not be displayed correctly.Just like this:
±±??o???o?ìì′ó?§????′|
... ú?ò?÷àà???D????£??????a・¢?¢????3é1?×aè?oíí?1?£??? ... ù
′??D??°ì1?êò?¢?????a・¢2??¢3 ...
http://cse.buaa.edu.cn/ (网页快照) (评分详解) (anchors)
Are there some hints to deal with this problem?
thx!
Cao Yuzhong from BeiJing China
2005-03-06
Re: My nutch does not support chinese keywords.
Posted by Feng Zhou <fe...@gmail.com>.
I've managed to get the dev verison in subversion to search Chinese
pages. In my case, the only problem turned out to be the Web
container's. I use Tomcat 5.5 and here's my experience. I had to
change the Web connector config line in server.xml to this:
<Connector port="8080"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
URIEncoding="UTF-8" useBodyEncodingForURI="true"/>
The last two tags are what's added. Essentially it forces Tomcat to
parse the GET parameters as UTF-8 encoded. The default encoding is
ISO8859-1, which unfortunately would not work for any non-western
languages. So this actually apply to many other languages as well.
Read http://issues.apache.org/bugzilla/show_bug.cgi?id=22666 for
details (scroll down to the end).
- Feng Zhou
On Sun, 06 Mar 2005 09:30:19 +0000, cao yuzhong <ca...@hotmail.com> wrote:
> hi,
>
> My nutch does not support chinese keywords.
>
> I have tried to install nutch-0.5 and nutch-0.6 on
> windows xp professional(simplified chinese version) and Fedora3.
>
> If using chinese keywords,all the chinese characters in the result
> pages can not be displayed correctly.Just like this:
>
> ±±??o???o?ìì′ó?§????′|
> ... ú?ò?÷àà???D????£??????a·¢?¢????3é1?×aè?oíí?1?£??? ... ù
> ′??D??°ì1?êò?¢?????a·¢2??¢3 ...
> http://cse.buaa.edu.cn/ (网页快照) (评分详解) (anchors)
>
> Are there some hints to deal with this problem?
>
> thx!
>
> Cao Yuzhong from BeiJing China
> 2005-03-06
>
>
--
- Feng