You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Meraj A. Khan" <me...@gmail.com> on 2014/09/02 21:32:15 UTC

ApacheCon Presentation

Hi Lewis,

I am looking at your presentation on ApacheCon titled 'Building your big
data search stack with Apache Nutch 2.x' at the link below.


http://prezi.com/gkomeulfuqhh/building-your-big-data-search-stack-with-apache-nutch-2x/?utm_campaign=share&utm_medium=copy

In there I saw a slide mentioning about the future work in parsing the
sitemaps using crawler-commons, what is the purpose of such a functionality
, are we working on that future feature because fetching URLs based off a
sitemap would be better than fetching and crawling with a seed URL that
could be the home page of the web site ?

Thanks.