You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2008/02/20 19:15:22 UTC

[Nutch Wiki] Update of "NutchTutorial" by MarioMendez

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by MarioMendez:
http://wiki.apache.org/nutch/NutchTutorial

------------------------------------------------------------------------------
  
   {{{ http://lucene.apache.org/nutch/ }}}
  
-  * Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the apache.org domain, the line should read:
+  * Edit the file conf/crawl-urlfilter.txt (it works for me when I used the file conf/regex-urlfilter.txt) and replace MY.DOMAIN.NAME with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the apache.org domain, the line should read:
  
   {{{ +^http://([a-z0-9]*\.)*apache.org/ }}}