You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "hugo.ma" <hu...@gmail.com> on 2012/09/07 11:42:34 UTC
Re: Nutch and sitemaps
Thanks.
I was looking for a easy solution for parsing very large sitemaps.
I think nutch 2.0 dont have support for sitemaps.
--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-and-sitemaps-tp4005173p4006100.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Nutch and sitemaps
Posted by Ken Krugler <kk...@transpac.com>.
On Sep 7, 2012, at 2:42am, hugo.ma wrote:
> Thanks.
>
> I was looking for a easy solution for parsing very large sitemaps.
> I think nutch 2.0 dont have support for site maps.
There's code in crawler-commons for sitemap parsing.
See http://code.google.com/p/crawler-commons/
and also
https://issues.apache.org/jira/browse/NUTCH-1465
-- Ken
--------------------------------------------
http://about.me/kkrugler
+1 530-210-6378
--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr