You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Prashant Dave <pd...@gwmail.gwu.edu> on 2012/08/17 17:20:43 UTC

Nutch 2.0 and Sitemap

Is there sitemap support in nutch 2.0. For a given website, first get
and parse the sitemap file and crawl only those urls that are in the
sitemap? I read online that this functionality was in the roadmap but
have not been able to find out if sitemap is supported.

Thanks

Prashant

RE: Nutch 2.0 and Sitemap

Posted by Markus Jelsma <ma...@openindex.io>.
Hi,

There is no support for sitemaps in either 1.x or 2.x but you're more than welcome to provide any patches if you have them.

Cheers

 
 
-----Original message-----
> From:Prashant Dave <pd...@gwmail.gwu.edu>
> Sent: Fri 17-Aug-2012 17:23
> To: user@nutch.apache.org
> Subject: Nutch 2.0 and Sitemap
> 
> Is there sitemap support in nutch 2.0. For a given website, first get
> and parse the sitemap file and crawl only those urls that are in the
> sitemap? I read online that this functionality was in the roadmap but
> have not been able to find out if sitemap is supported.
> 
> Thanks
> 
> Prashant
>