You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/10/16 16:01:52 UTC

[Nutch Wiki] Update of "nutch-0.8-dev/bin/nutch readdb" by RenaudRichardet

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by RenaudRichardet:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_readdb

The comment on the change is:
added info about -stat 

------------------------------------------------------------------------------
   None.
  
  === Caveats and Notes ===
-  None.
+ 
+ ==== stat command ====
+  the command '''-stat''' is quite useful to get a quick overview of the performed crawl. The output have following meaning:
+  * DB_unfetched are pages that are linked to by fetched pages, but not fetched yet (because they are not passing the url filters or are not in the TopN links that Nutch selects for its next fetch cycle).
+  * DB_gone means that a 404 or some other presumably permanent error was encountered.  This status prevents future attempts to fetch a url.
+  * DB_fetched is the number of document that have been fetched and indexed. That's what is important. If you have "status 2 (DB_fetched):  0", then something went wrong.
+  * (see [http://mail-archives.apache.org/mod_mbox/lucene-nutch-dev/200601.mbox/%3C43C54E85.8040703@nutch.org%3E])  
  
  DevelopmentCommandLineOptions