You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jeremy Davis <jd...@upstreamsoftware.com> on 2012/05/12 01:57:15 UTC

Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )

I see mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't another tuning parameter I need to look at.

I can tune the task tracker so that when I have many jobs running, with many simultaneous maps and reduces I utilize 95% of cpu and memory. 

Inevitably though I end up with a huge final reduce task that only uses half of of my cluster because I have reserved the other half for Mapping. 

Is there a way around this problem?  

Seems like there should also be a maximum number of reducers conditional on no Map tasks running. 

-JD

Re: Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )

Posted by Abhishek Pratap Singh <ma...@gmail.com>.
Hi JD,

Number of reduce task will depend upon the key after all the mapper is
done. if the key is same than all the data will go to one node, similarly
utilization of all nodes of cluster will depend upon the number of
different keys for reduce task.


Regards,
Abhishek

On Fri, May 11, 2012 at 4:57 PM, Jeremy Davis
<jd...@upstreamsoftware.com>wrote:

>
> I see mapred.tasktracker.reduce.tasks.maximum and
> mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't
> another tuning parameter I need to look at.
>
> I can tune the task tracker so that when I have many jobs running, with
> many simultaneous maps and reduces I utilize 95% of cpu and memory.
>
> Inevitably though I end up with a huge final reduce task that only uses
> half of of my cluster because I have reserved the other half for Mapping.
>
> Is there a way around this problem?
>
> Seems like there should also be a maximum number of reducers conditional
> on no Map tasks running.
>
> -JD