You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sébastien Rainville <se...@gmail.com> on 2010/06/11 14:35:18 UTC

Why not having mapred.tasktracker.tasks.maximum?

Hi,

I'm playing around with the hadoop config to optimize the resources of our
cluster. I'm noticing that the cpu usage is sub-optimal. All the machines in
the cluster have 1 quad core cpu. I looked at our
mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum settings and the max map tasks
is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for running
the database (Cassandra) and the OS.

My question is: why separating the settings for the map tasks and reduce
tasks? I feel like what I want is to set mapred.tasktracker.tasks.maximum=3,
so that all the cpus are always available for both map and reduce tasks.

Am I missing something?

Thanks,
Sebastien

Re: Why not having mapred.tasktracker.tasks.maximum?

Posted by Ted Yu <yu...@gmail.com>.

See https://issues.apache.org/jira/browse/HADOOP-3420
This topic was discussed two years ago.

On Fri, Jun 11, 2010 at 8:45 AM, Edward Capriolo <ed...@gmail.com>wrote:

> On Fri, Jun 11, 2010 at 8:35 AM, Sébastien Rainville <
> sebastienrainville@gmail.com> wrote:
>
> > Hi,
> >
> > I'm playing around with the hadoop config to optimize the resources of
> our
> > cluster. I'm noticing that the cpu usage is sub-optimal. All the machines
> > in
> > the cluster have 1 quad core cpu. I looked at our
> > mapred.tasktracker.map.tasks.maximum
> > and mapred.tasktracker.reduce.tasks.maximum settings and the max map
> tasks
> > is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for
> running
> > the database (Cassandra) and the OS.
> >
> > My question is: why separating the settings for the map tasks and reduce
> > tasks? I feel like what I want is to set
> > mapred.tasktracker.tasks.maximum=3,
> > so that all the cpus are always available for both map and reduce tasks.
> >
> > Am I missing something?
> >
> > Thanks,
> > Sebastien
> >
>
> That suggestion makes sense. As you run more concurrent jobs you may find
> that having dedicated slots for reduce tasks is useful. You would not want
> a
> cluster running 600 mappers and 0 reducers :)
>

Re: Why not having mapred.tasktracker.tasks.maximum?

Posted by Edward Capriolo <ed...@gmail.com>.

On Fri, Jun 11, 2010 at 8:35 AM, Sébastien Rainville <
sebastienrainville@gmail.com> wrote:

> Hi,
>
> I'm playing around with the hadoop config to optimize the resources of our
> cluster. I'm noticing that the cpu usage is sub-optimal. All the machines
> in
> the cluster have 1 quad core cpu. I looked at our
> mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum settings and the max map tasks
> is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for running
> the database (Cassandra) and the OS.
>
> My question is: why separating the settings for the map tasks and reduce
> tasks? I feel like what I want is to set
> mapred.tasktracker.tasks.maximum=3,
> so that all the cpus are always available for both map and reduce tasks.
>
> Am I missing something?
>
> Thanks,
> Sebastien
>

That suggestion makes sense. As you run more concurrent jobs you may find
that having dedicated slots for reduce tasks is useful. You would not want a
cluster running 600 mappers and 0 reducers :)