You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Ted Yu <yu...@gmail.com> on 2010/05/05 06:57:18 UTC

Re: multicore node clusters

Hi,
I am looking for config parameter similar to the following which would allow
us to limit the total number of mapper and reducer tasks:

mapred.tasktracker.tasks.maximum

Please advise.

On Thu, Sep 10, 2009 at 6:31 AM, Chandraprakash Bhagtani <
cpbhagtani@gmail.com> wrote:

> Hi,
>
> You should definitely change mapred.tasktracker.map/reduce.tasks.maximum.
> If
> your tasks are more CPU bound then you should run the tasks equal to the
> number of CPU cores otherwise you can run more tasks than cores. You can
> determine CPU and memory usage by running "top" command on datanodes. You
> should also take care of following configuration parameters to achieve best
> performance
>
> *mapred.compress.map.output:* Faster data transfer (from mapper to
> reducers), saves disk space, faster disk writing. Extra time in compression
> and decompression
>
> *io.sort.mb: *If you have idle physical memory after running all tasks you
> can increase this value. But swap space should not be used since it makes
> it
> slow.*
>
> **io.sort.factor: *If your map tasks have large number of spills* *then you
> should increase this value.It also helps in merging at reducers.
>
> *mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each
> task
> is around 1 second. So for the tasks which live for seconds or a few
> minutes
> and have lengthy initialization, this value can be increased to gain
> performance.
>
> *mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map
> output is very large), value of this property can be increased keeping in
> mind that it will increase the total CPU usage.*
>
> **mapred.map/reduce.tasks.speculative.execution: *set to false to gain high
> throughput.
>
> *dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to
> control the number of maps
>
> On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <matthew.kelcey@gmail.com
> >wrote:
>
> > > I've a cluster where every node is a multicore. From doing internet
> > searches I've figured out that I definitely need to change
> > mapred.tasktracker.tasks.maximum according to the number of clusters. But
> > there are definitely other things that I would like to change for example
> > mapred.map.tasks. Can someone point me out the list of things I should
> > change to get the best performance out of my cluster ?
> >
> > nothing will give you better results than benchmarking with some jobs
> > indicative to your domain!
> >
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
>