You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "hugo.ma" <hu...@gmail.com> on 2012/09/07 11:42:34 UTC

Re: Nutch and sitemaps

Thanks.

I was looking for a easy solution for parsing very large sitemaps.
I think nutch 2.0 dont have support for sitemaps.



--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-and-sitemaps-tp4005173p4006100.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch and sitemaps

Posted by Ken Krugler <kk...@transpac.com>.
On Sep 7, 2012, at 2:42am, hugo.ma wrote:

> Thanks.
> 
> I was looking for a easy solution for parsing very large sitemaps.
> I think nutch 2.0 dont have support for site maps.

There's code in crawler-commons for sitemap parsing.

See http://code.google.com/p/crawler-commons/

and also

https://issues.apache.org/jira/browse/NUTCH-1465

-- Ken

--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378




--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr