You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Annona Keene <an...@yahoo.com> on 2007/07/09 18:13:45 UTC

Locale for Nutch?

I'm working on crawling a page that has multiple versions based on your browser's language preference. Is there a way I can set a locale in Nutch so that it will get the pages I'm looking for? For example, I'd like to set my locale to Japanese so when I fetch www.foo.com I get the index.ja.html page instead of the English page.

Thanks,
Ann




       
____________________________________________________________________________________
Sick sense of humor? Visit Yahoo! TV's 
Comedy with an Edge to see what's on, when. 
http://tv.yahoo.com/collections/222

Re: Locale for Nutch?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Annona Keene wrote:
> I'm working on crawling a page that has multiple versions based on
> your browser's language preference. Is there a way I can set a locale
> in Nutch so that it will get the pages I'm looking for? For example,
> I'd like to set my locale to Japanese so when I fetch www.foo.com I
> get the index.ja.html page instead of the English page.

Please see org.apache.nutch.protocol.httpclient.Http.java:116 - 
currently this is hardcoded, but it would be easy to turn it into a 
configuration parameter.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com