You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kauu <ba...@gmail.com> on 2006/10/31 04:08:10 UTC
Re: Get messy code while fecthing ftp sites
that's my problem toooooooooooo!
how did u cut the Chinese ,ictclas?
the problem may be the charactor encoding
On 10/31/06, fangky@gzedu.gov.cn < fangky@gzedu.gov.cn> wrote:
>
> I meet a very strange problem,my nutch8.1 can crawl http sites normally
> but while fetching ftp sites, it got messy code.
>
> for example, in the root diretory of an ftp site, there is a subdiretory
> named in chinese"教学资源中心", a normal crawl result should be
> index of /教学资源中心
> but when nutch fetch it, it become
> Index of /???¡ì¡Á???????/
> this problem not apeared while feching diretories named in english.
>
> can anyone tell me how to do ?
>
> thanks in advance.
>
>
--
www.babatu.com