You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by kauu <ba...@gmail.com> on 2006/10/31 04:08:10 UTC

Re: Get messy code while fecthing ftp sites

that's my problem toooooooooooo!

how did u cut the Chinese ,ictclas?

the problem may be the charactor encoding

On 10/31/06, fangky@gzedu.gov.cn < fangky@gzedu.gov.cn> wrote:
>
> I meet a very strange problem,my nutch8.1 can crawl http sites normally
> but while fetching ftp sites, it got messy code.
>
> for example, in the root diretory of an ftp site, there is a subdiretory
> named in chinese"教学资源中心", a normal crawl result should be
> index of /教学资源中心
> but when nutch fetch it, it become
> Index of /???&iexcl;ì&iexcl;&Aacute;???????/
> this problem not apeared while feching diretories named in english.
>
> can anyone tell me how to do ?
>
> thanks in advance.
>
>


-- 
www.babatu.com