You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ch huang <ju...@gmail.com> on 2013/12/12 02:52:18 UTC

is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

hi,maillist:
              Due to the heavy job on reduce task, i try to increase buffer
size for sort merge,i wander if i increase mapreduce.task.io.sort.mb from
100m(default value) to 1G will cause each map task  sort merge buffer also
become 1G?

Re: is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

Posted by Dieter De Witte <dr...@gmail.com>.
this parameter is the size of a spill on the map side, each time the spill
is full it is sorted and written to disk. On the reduce side there is also
a range of parameters. I am not sure why you would increase these buffer
sizes since they are eating up your heapsize, it depends on what you mean
with a heavy job. In my case a heavy job needed a lot of heap size so I
scaled down the buffers for inmemory merging. to learn more about the
tuning in the shuffle and sort phase check the reference:

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

Reading this will be an eye-opener..


2013/12/12 ch huang <ju...@gmail.com>

> hi,maillist:
>               Due to the heavy job on reduce task, i try to increase
> buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb
> from 100m(default value) to 1G will cause each map task  sort merge buffer
> also become 1G?
>

Re: is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

Posted by Dieter De Witte <dr...@gmail.com>.
this parameter is the size of a spill on the map side, each time the spill
is full it is sorted and written to disk. On the reduce side there is also
a range of parameters. I am not sure why you would increase these buffer
sizes since they are eating up your heapsize, it depends on what you mean
with a heavy job. In my case a heavy job needed a lot of heap size so I
scaled down the buffers for inmemory merging. to learn more about the
tuning in the shuffle and sort phase check the reference:

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

Reading this will be an eye-opener..


2013/12/12 ch huang <ju...@gmail.com>

> hi,maillist:
>               Due to the heavy job on reduce task, i try to increase
> buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb
> from 100m(default value) to 1G will cause each map task  sort merge buffer
> also become 1G?
>

Re: is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

Posted by Dieter De Witte <dr...@gmail.com>.
this parameter is the size of a spill on the map side, each time the spill
is full it is sorted and written to disk. On the reduce side there is also
a range of parameters. I am not sure why you would increase these buffer
sizes since they are eating up your heapsize, it depends on what you mean
with a heavy job. In my case a heavy job needed a lot of heap size so I
scaled down the buffers for inmemory merging. to learn more about the
tuning in the shuffle and sort phase check the reference:

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

Reading this will be an eye-opener..


2013/12/12 ch huang <ju...@gmail.com>

> hi,maillist:
>               Due to the heavy job on reduce task, i try to increase
> buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb
> from 100m(default value) to 1G will cause each map task  sort merge buffer
> also become 1G?
>

Re: is mapreduce.task.io.sort.mb control both map merge buffer and reduce merge buffer?

Posted by Dieter De Witte <dr...@gmail.com>.
this parameter is the size of a spill on the map side, each time the spill
is full it is sorted and written to disk. On the reduce side there is also
a range of parameters. I am not sure why you would increase these buffer
sizes since they are eating up your heapsize, it depends on what you mean
with a heavy job. In my case a heavy job needed a lot of heap size so I
scaled down the buffers for inmemory merging. to learn more about the
tuning in the shuffle and sort phase check the reference:

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

Reading this will be an eye-opener..


2013/12/12 ch huang <ju...@gmail.com>

> hi,maillist:
>               Due to the heavy job on reduce task, i try to increase
> buffer size for sort merge,i wander if i increase mapreduce.task.io.sort.mb
> from 100m(default value) to 1G will cause each map task  sort merge buffer
> also become 1G?
>