You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by carmmello <ca...@globo.com> on 2005/10/03 21:39:23 UTC

problems with crawling speed

I have been trying Nutch for a while, but it seems that I am encountering some problems with the indexing speed in the later versions.  When I try to crawl about 300 sites (to depth 4, for instance), the initial speed is about 2 pages per second (I have a connection of about 600kbps), but when new segments are being generated, that speed becames as low as only 0.4 pages per second.   I use the default Nutch configuration and the same thing hapens when I try the whole web indexing method (using the same 300 sites).  I don't have records, but I recall that in the earlier versions of Nutch,  the indexing speed did not decrease at all, or at least,  not to that proportion.
Am I missing some thing?
Thanks