You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Niels Boldt <ni...@gmail.com> on 2010/01/25 19:10:38 UTC

distributing fetch load among hosts

Hi,

We are running a job where we fetches pages from approximately 10 different
host. We also do this in a rather conservative way with regards to fetch pr
seconds so we do not bother.

What we noticed is that often the load among the host are concentrated among
2 or 3 different hosts, so we have a situation where we fetch all pages from
a couple of hosts, and then move on to another on etc. However, this is not
the optimal way to do it, because we could easily handle that we fetches
from all allowed hosts in parallel.

If that would be the case, the job would finish way faster.

Are there any ways to ensure a better fetch distribution, any settings you
could apply etc.

Best Regards
Niels
-- 

BinaryConstructors ApS
Vestergade 10a, 4th
1456 Kbh K
Denmark
phone: +4528138757
web: http://www.binaryconstructors.dk
mail: nb@binaryconstructors.dk
skype: nielsboldt