You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by pk...@apache.org on 2005/08/08 19:59:41 UTC
svn commit: r230828 -
/lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml
Author: pkosiorowski
Date: Mon Aug 8 10:59:34 2005
New Revision: 230828
URL: http://svn.apache.org/viewcvs?rev=230828&view=rev
Log:
Changed URLs in tutorial to point to apache.
Modified:
lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml
Modified: lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml
URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml?rev=230828&r1=230827&r2=230828&view=diff
==============================================================================
--- lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml (original)
+++ lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tutorial.xml Mon Aug 8 10:59:34 2005
@@ -34,7 +34,7 @@
<p>First, you need to get a copy of the Nutch code. You can download
a release from <a
-href="http://www.nutch.org/release/">http://www.nutch.org/release/</a>.
+href="http://lucene.apache.org/nutch/release/">http://lucene.apache.org/nutch/release/</a>.
Unpack the release and connect to its top-level directory. Or, check
out the latest source code from <a
href="version_control.html">subversion</a> and build it
@@ -67,23 +67,23 @@
<ol>
<li>Create a flat file of root urls. For example, to crawl the
-<code>nutch.org</code> site you might start with a file named
+<code>nutch</code> site you might start with a file named
<code>urls</code> containing just the Nutch home page. All other
Nutch pages should be reachable from this page. The <code>urls</code>
file would thus look like:
<source>
-http://www.nutch.org/
+http://lucene.apache.org/nutch/
</source>
</li>
<li>Edit the file <code>conf/crawl-urlfilter.txt</code> and replace
<code>MY.DOMAIN.NAME</code> with the name of the domain you wish to
crawl. For example, if you wished to limit the crawl to the
-<code>nutch.org</code> domain, the line should read:
+<code>apache.org</code> domain, the line should read:
<source>
-+^http://([a-z0-9]*\.)*nutch.org/
++^http://([a-z0-9]*\.)*apache.org/
</source>
-This will include any url in the domain <code>nutch.org</code>.
+This will include any url in the domain <code>apache.org</code>.
</li>
</ol>