You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mathijs Homminga <ma...@knowlogy.nl> on 2007/06/21 21:36:40 UTC

Cluster efficiency

Hi all,

In an ideal world, my TaskTrackers would be working for me all the time.
That is: the average number of tasks a TaskTracker is 
handling/processing would be close to 'mapred.tasktracker.tasks.maximum' 
for any given time period.

But... it might not always be possible to feed the slaves with enough 
tasks. Sometimes jobs have to be finished before another can start, and 
if some slaves finish their tasks faster than others (faster machines, 
smaller tasks) they will have to wait for others to complete theirs.

Is there a way to easily determine the efficiency of my cluster?
Example:
- there are 5 slaves which can handle 1 task at the time each
- there is one job, split into 5 sub tasks (5 maps and 5 reduces)
- 4 slaves finish their tasks in 1 minute
- 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1 minute)

... then one could say that the cluster usage is 60% (6 working minutes, 
4 waiting minutes)

Mathijs

-- 
Knowlogy
Helperpark 290 C
9723 ZA Groningen

mathijs.homminga@knowlogy.nl
+31 (0)6 15312977
http://www.knowlogy.nl

Re: Cluster efficiency

Posted by Doug Cutting <cu...@apache.org>.

Mathijs Homminga wrote:
> Is there a way to easily determine the efficiency of my cluster?
> Example:
> - there are 5 slaves which can handle 1 task at the time each
> - there is one job, split into 5 sub tasks (5 maps and 5 reduces)
> - 4 slaves finish their tasks in 1 minute
> - 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1 
> minute)
> 
> ... then one could say that the cluster usage is 60% (6 working minutes, 
> 4 waiting minutes)

A standard way to improve this is to increase the number of tasks.  If 
you instead have 10 tasks/node, then a node that runs at half speed 
shouldn't affect the overall time nearly as much.

Doug