You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Paul Harrison <pr...@swbell.net> on 2005/08/16 20:35:04 UTC

Slow Results

I have crawled some 100 million pages and am running this on five P4 3.0 GHz
machines with a 40 GB OS drive and two 250 GB data drives.  I am trying to
get Nutch to grab 1000 results so I can pass them to a separate program I
have instead of using the Nutch default (100 I think).  As a result it takes
an enormous amount of time to get results.  So I backed the number of pages
indexed to 7 million and still having Nutch grab 1000 results instead of the
default.  While the results were better they are still unusable as it is
taking between 15 and 20 seconds to complete the task.  Does anyone have any
idea why Nutch slows down so bad when you have it grab 1000 pages instead of
the default number?  Does anyone have any suggestions on how to speed this
process up?  Do I use more machines, upgrade to a newer version of Nutch,
etc.?

 

Any help would be MOST appreciated.

 

Thanks,

 

Paul