You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Tim Patton <tp...@dealcatcher.com> on 2006/02/24 21:10:32 UTC

Load Balancing?

Sorry to post this on the Dev list, but the user list doesn't seem to get
any traffic.  I've been going through the code trying to figure out how
Hadoop would load balance among machines.  Specifically, if I had two types
of tasks, one high CPU and one low CPU, how can I make sure machines aren't
getting too bogged down by being assigned too many high CPU  tasks or aren't
sitting idle when they could be running more low CPU tasks?  I suppose the
same question could be asked about high and low RAM usage as well.   From
looking at the Nutch source code, it appears it fetches/processes first,
then indexes, when it could be done in parallel.  Is this why?

 

Tim