You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chetan Patel <ch...@webmail.aruhat.com> on 2009/06/01 16:23:20 UTC

Re: Arabic language in Nutch

Hello,

I want to crawl Arabic URL
(http://www.kuna.net.kw/NewsAgenciesPublicSite/HomePage.aspx?Language=ar) It
contains charset  windows-1256.

I have another URL (http://www.afp.com/afpcom/ar/home) and it contains
charset UTF-8. This links work fine(crawling, indexing and searching working
properly). 

When I search in above url it return unreadable characters (ظƒظˆظ†ط§
ظ„ظ„ط¥ط¹ظ„ط§ظ† ط¹ظ„ظ‰ ط§ظ„ظ…ظˆظ‚ط¹ ط¥ط). I want to search properly.   

Is there any issue with charset? plz help me. 

Thanks in advance.

Regards,
Chetan Patel
-- 
View this message in context: http://www.nabble.com/Arabic-language-in-Nutch-tp9269533p23815696.html
Sent from the Nutch - User mailing list archive at Nabble.com.