You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2012/01/30 13:50:36 UTC

Best practices for hadoop shuffling/tunning ?

Hey guys,

Just wanted to ask, are there any sort of best practices to be followed for
hadoop shuffling improvements ?

I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
with 48 GB RAM.

I have set the following parameters :

fs.inmemory.size.mb=2000
io.sort.mb=2000
io.sort.factor=200
io.file.buffer.size=262544

mapred.map.tasks=200
mapred.reduce.tasks=40
mapred.reduce.parallel.copies=80
mapred.map.child.java.opts = 1024 Mb
mapred.map.reduce.java.opts=1024 Mb

mapred.job.tracker.handler.count=60
tasktracker.http.threads=50
mapred.job.reuse.jvm.num.tasks = -1
mapred.compress.map.output = true
mapred.reduce.slowstart.completed.maps = 0.5

mapred.tasktracker.map.tasks.maximum=24
mapred.tasktracker.reduce.tasks.maximum=12


Can anyone please validate the above tuning parameters, and suggest any
further improvements ?
My mappers are running fine. Shuffling and reducing part is comparatively
slower, than expected for normal jobs. Wanted to know what I am doing
wrong/missing.

Thanks,
Praveenesh

Re: Best practices for hadoop shuffling/tunning ?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.

Your io.sort.mb is too high. You only have 1G of heap for the map. Reduce parallel copies is too high too.

On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote:

> Hey guys,
> 
> Just wanted to ask, are there any sort of best practices to be followed for
> hadoop shuffling improvements ?
> 
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
> 
> I have set the following parameters :
> 
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
> 
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
> 
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
> 
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
> 
> 
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
> 
> Thanks,
> Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Best practices for hadoop shuffling/tunning ?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.

Your io.sort.mb is too high. You only have 1G of heap for the map. Reduce parallel copies is too high too.

On Jan 30, 2012, at 4:50 AM, praveenesh kumar wrote:

> Hey guys,
> 
> Just wanted to ask, are there any sort of best practices to be followed for
> hadoop shuffling improvements ?
> 
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
> 
> I have set the following parameters :
> 
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
> 
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
> 
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
> 
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
> 
> 
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
> 
> Thanks,
> Praveenesh

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Best practices for hadoop shuffling/tunning ?

Posted by praveenesh kumar <pr...@gmail.com>.
Can anyone please eyeball the config parameters as defined below and share
their thoughts on this ?

Thanks,
Praveenesh

On Mon, Jan 30, 2012 at 6:20 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Hey guys,
>
> Just wanted to ask, are there any sort of best practices to be followed
> for hadoop shuffling improvements ?
>
> I am running Hadoop 0.20.205 on 8 nodes cluster.Each node is 24 cores/CPUs
> with 48 GB RAM.
>
> I have set the following parameters :
>
> fs.inmemory.size.mb=2000
> io.sort.mb=2000
> io.sort.factor=200
> io.file.buffer.size=262544
>
> mapred.map.tasks=200
> mapred.reduce.tasks=40
> mapred.reduce.parallel.copies=80
> mapred.map.child.java.opts = 1024 Mb
> mapred.map.reduce.java.opts=1024 Mb
>
> mapred.job.tracker.handler.count=60
> tasktracker.http.threads=50
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.compress.map.output = true
> mapred.reduce.slowstart.completed.maps = 0.5
>
> mapred.tasktracker.map.tasks.maximum=24
> mapred.tasktracker.reduce.tasks.maximum=12
>
>
> Can anyone please validate the above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
>
> Thanks,
> Praveenesh
>
>