You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Joel Halbert <jo...@su3analytics.com> on 2009/04/28 18:34:14 UTC

N 0.9 - fetcher.threads.per.host

Hi,

I have noticed that the following settings do not interplay as I
expected:

fetcher.threads.fetch
fetcher.threads.per.host

Assuming that I have the following settings:

fetcher.server.delay = 4
fetcher.threads.fetch = 10
fetcher.threads.per.host = 1

then I assumed that the min time between requests to an individual host
would be 4 seconds. However it seems that fetcher.threads.per.host is
being applied on a per thread basis. If I only have one site in my list
of urls to crawl then it appears that 10 fetcher threads are created
anyway and they all make concurrent requests of the site.

Is this expected or have I misunderstood how these settings are to be
used?

Thanks,

Joel