You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2015/08/24 18:05:51 UTC

[Nutch Wiki] Update of "SitemapFeature" by CihadGuzel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "SitemapFeature" page has been changed by CihadGuzel:
https://wiki.apache.org/nutch/SitemapFeature?action=diff&rev1=9&rev2=10

  For more information on Sitemaps, see the official page of [[http://www.sitemaps.org/|Sitemap protocol]]
  
  == Steps to run ==
- For Nutch 1.x:
+ ==== For Nutch 1.x: ====
  {{{
  bin/nutch sitemap <crawldb> [-hostdb <hostdb>] [-sitemapUrls <sitemapUrls>] [-threads <threads>] [-force] [-noFilter] [-noNormalize]
  }}}
@@ -31, +31 @@

  '''-noFilter''' Turn off URLFilters on urls (optional)
  
  '''-noNormalize''' Turn off URLNormalizer on urls (optional)
+ 
+ ----
+ ==== For Nutch 2.x: ====
+ Please follow [[https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler|here]].
+ ----
  
  == How Nutch processes Sitemap ? ==