You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Arun C Murthy <ac...@hortonworks.com> on 2012/01/31 22:31:22 UTC

Re: Best practices for hadoop shuffling/tunning ?

Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.

Your io.sort.mb is too high. You only have 1G of heap for the map. Reduce parallel copies is too high too.

On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote:

> Hey guys,
> 
> Just wanted to ask, are there any sort of best practices to be followed for
> hadoop shuffling improvements ?
> 
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
> 
> I have set the following parameters :
> 
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
> 
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
> 
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
> 
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
> 
> 
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
> 
> Thanks,
> Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/