You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2015/08/24 18:05:51 UTC
[Nutch Wiki] Update of "SitemapFeature" by CihadGuzel
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "SitemapFeature" page has been changed by CihadGuzel:
https://wiki.apache.org/nutch/SitemapFeature?action=diff&rev1=9&rev2=10
For more information on Sitemaps, see the official page of [[http://www.sitemaps.org/|Sitemap protocol]]
== Steps to run ==
- For Nutch 1.x:
+ ==== For Nutch 1.x: ====
{{{
bin/nutch sitemap <crawldb> [-hostdb <hostdb>] [-sitemapUrls <sitemapUrls>] [-threads <threads>] [-force] [-noFilter] [-noNormalize]
}}}
@@ -31, +31 @@
'''-noFilter''' Turn off URLFilters on urls (optional)
'''-noNormalize''' Turn off URLNormalizer on urls (optional)
+
+ ----
+ ==== For Nutch 2.x: ====
+ Please follow [[https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler|here]].
+ ----
== How Nutch processes Sitemap ? ==