You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mathijs Homminga <ma...@knowlogy.nl> on 2007/06/21 21:36:40 UTC
Cluster efficiency
Hi all,
In an ideal world, my TaskTrackers would be working for me all the time.
That is: the average number of tasks a TaskTracker is
handling/processing would be close to 'mapred.tasktracker.tasks.maximum'
for any given time period.
But... it might not always be possible to feed the slaves with enough
tasks. Sometimes jobs have to be finished before another can start, and
if some slaves finish their tasks faster than others (faster machines,
smaller tasks) they will have to wait for others to complete theirs.
Is there a way to easily determine the efficiency of my cluster?
Example:
- there are 5 slaves which can handle 1 task at the time each
- there is one job, split into 5 sub tasks (5 maps and 5 reduces)
- 4 slaves finish their tasks in 1 minute
- 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1 minute)
... then one could say that the cluster usage is 60% (6 working minutes,
4 waiting minutes)
Mathijs
--
Knowlogy
Helperpark 290 C
9723 ZA Groningen
mathijs.homminga@knowlogy.nl
+31 (0)6 15312977
http://www.knowlogy.nl
Re: Cluster efficiency
Posted by Doug Cutting <cu...@apache.org>.
Mathijs Homminga wrote:
> Is there a way to easily determine the efficiency of my cluster?
> Example:
> - there are 5 slaves which can handle 1 task at the time each
> - there is one job, split into 5 sub tasks (5 maps and 5 reduces)
> - 4 slaves finish their tasks in 1 minute
> - 1 slave finishes its tasks in 2 minutes (so 4 slaves are waiting 1
> minute)
>
> ... then one could say that the cluster usage is 60% (6 working minutes,
> 4 waiting minutes)
A standard way to improve this is to increase the number of tasks. If
you instead have 10 tasks/node, then a node that runs at half speed
shouldn't affect the overall time nearly as much.
Doug