You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by himanshu chandola <hi...@yahoo.com> on 2009/09/09 19:15:01 UTC

multicore node clusters

Hi,
I've a cluster where every node is a multicore. From doing internet searches I've figured out that I definitely need to change mapred.tasktracker.tasks.maximum according to the number of clusters. But there are definitely other things that I would like to change for example mapred.map.tasks. Can someone point me out the list of things I should change to get the best performance out of my cluster ?

thanks

 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.

Re: multicore node clusters

Posted by Ted Yu <yu...@gmail.com>.

Hi,
I am looking for config parameter similar to the following which would allow
us to limit the total number of mapper and reducer tasks:

mapred.tasktracker.tasks.maximum

Please advise.

On Thu, Sep 10, 2009 at 6:31 AM, Chandraprakash Bhagtani <
cpbhagtani@gmail.com> wrote:

> Hi,
>
> You should definitely change mapred.tasktracker.map/reduce.tasks.maximum.
> If
> your tasks are more CPU bound then you should run the tasks equal to the
> number of CPU cores otherwise you can run more tasks than cores. You can
> determine CPU and memory usage by running "top" command on datanodes. You
> should also take care of following configuration parameters to achieve best
> performance
>
> *mapred.compress.map.output:* Faster data transfer (from mapper to
> reducers), saves disk space, faster disk writing. Extra time in compression
> and decompression
>
> *io.sort.mb: *If you have idle physical memory after running all tasks you
> can increase this value. But swap space should not be used since it makes
> it
> slow.*
>
> **io.sort.factor: *If your map tasks have large number of spills* *then you
> should increase this value.It also helps in merging at reducers.
>
> *mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each
> task
> is around 1 second. So for the tasks which live for seconds or a few
> minutes
> and have lengthy initialization, this value can be increased to gain
> performance.
>
> *mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map
> output is very large), value of this property can be increased keeping in
> mind that it will increase the total CPU usage.*
>
> **mapred.map/reduce.tasks.speculative.execution: *set to false to gain high
> throughput.
>
> *dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to
> control the number of maps
>
> On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <matthew.kelcey@gmail.com
> >wrote:
>
> > > I've a cluster where every node is a multicore. From doing internet
> > searches I've figured out that I definitely need to change
> > mapred.tasktracker.tasks.maximum according to the number of clusters. But
> > there are definitely other things that I would like to change for example
> > mapred.map.tasks. Can someone point me out the list of things I should
> > change to get the best performance out of my cluster ?
> >
> > nothing will give you better results than benchmarking with some jobs
> > indicative to your domain!
> >
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
>

Re: multicore node clusters

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.

Hi,

You should definitely change mapred.tasktracker.map/reduce.tasks.maximum. If
your tasks are more CPU bound then you should run the tasks equal to the
number of CPU cores otherwise you can run more tasks than cores. You can
determine CPU and memory usage by running "top" command on datanodes. You
should also take care of following configuration parameters to achieve best
performance

*mapred.compress.map.output:* Faster data transfer (from mapper to
reducers), saves disk space, faster disk writing. Extra time in compression
and decompression

*io.sort.mb: *If you have idle physical memory after running all tasks you
can increase this value. But swap space should not be used since it makes it
slow.*

**io.sort.factor: *If your map tasks have large number of spills* *then you
should increase this value.It also helps in merging at reducers.

*mapred.job.reuse.jvm.num.tasks: *The overhead of JVM creation for each task
is around 1 second. So for the tasks which live for seconds or a few minutes
and have lengthy initialization, this value can be increased to gain
performance.

*mapred.reduce.parallel.copies: *For Large jobs (the jobs in which map
output is very large), value of this property can be increased keeping in
mind that it will increase the total CPU usage.*

**mapred.map/reduce.tasks.speculative.execution: *set to false to gain high
throughput.

*dfs.block.size* or *mapred.min.split.size* or *mapred.max.split.size* : to
control the number of maps

On Thu, Sep 10, 2009 at 8:06 AM, Mat Kelcey <ma...@gmail.com>wrote:

> > I've a cluster where every node is a multicore. From doing internet
> searches I've figured out that I definitely need to change
> mapred.tasktracker.tasks.maximum according to the number of clusters. But
> there are definitely other things that I would like to change for example
> mapred.map.tasks. Can someone point me out the list of things I should
> change to get the best performance out of my cluster ?
>
> nothing will give you better results than benchmarking with some jobs
> indicative to your domain!
>

-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

Re: multicore node clusters

Posted by Mat Kelcey <ma...@gmail.com>.

> I've a cluster where every node is a multicore. From doing internet searches I've figured out that I definitely need to change mapred.tasktracker.tasks.maximum according to the number of clusters. But there are definitely other things that I would like to change for example mapred.map.tasks. Can someone point me out the list of things I should change to get the best performance out of my cluster ?

nothing will give you better results than benchmarking with some jobs
indicative to your domain!