You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/09/20 17:16:27 UTC
[Nutch Wiki] Update of "FAQ" by JakeVanderdray
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/FAQ
The comment on the change is:
Just some formatting.
------------------------------------------------------------------------------
% cp nutch-0.7.war $CATALINA_HOME/webapps/ROOT.war
* After building your first index, start Tomcat from the index folder.
- Assuming your index is located at /index/db/
+ Assuming your index is located at /index/db/:
- % cd /index/db/
+ {{{% cd /index/db/
- % $CATATALINA_HOME/bin/startup.sh
+ % $CATATALINA_HOME/bin/startup.sh}}}
* After building your first index, start Tomcat from the index folder.
Start Tomcat
- % $CATATALINA_HOME/bin/startup.sh
+ % $CATATALINA_HOME/bin/startup.sh
Stop Tomcat
- % $CATATALINA_HOME/bin/startup.sh
+ % $CATATALINA_HOME/bin/startup.sh
Tomcat has extracted the contens of the ROOT.war file
Edit the nutch-default.xml which is located at:
$CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/
@@ -59, +59 @@
==== How can I recover an aborted fetch process? ====
You have two choices:
- 1) Use the aborted output. You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way.
+ 1. Use the aborted output. You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way.
- 2) Discard the aborted output. To do this, just delete the fetcher* directories in the segment and restart the fetcher.
+ 2. Discard the aborted output. To do this, just delete the fetcher* directories in the segment and restart the fetcher.
==== Who changes the next fetch date? ====
* After injecting a new url the next fetch date is set to the current time.