You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Satheesh Kumar <nk...@gmail.com> on 2012/08/03 15:23:24 UTC

MapReduce shuffle question

Team, can someone please clarify the following question?

In the map phase, the map output is written to the local disk. And in the
shuffle phase, the map output partitions are transferred to reduce nodes
using http. So, my question is assuming there are no spills (data set is
small enough to accommodate this), will the map output be transferred
directly from memory to the reduce nodes using http without a disk access
to write the map output? Or, is the map output always flushed to the disk
before transferred to reduce nodes?

Appreciate the help.

Thanks,
Satheesh

答复: MapReduce shuffle question

Posted by 梁李印 <li...@aliyun-inc.com>.
When a map task is done, its output is always flushed to the disk and merged
to one file.
The benefit is that if the reducer is failed, the map need not to re-run.

Liyin Liang

-----邮件原件-----
发件人: Satheesh Kumar [mailto:nkseam@gmail.com] 
发送时间: 2012年8月3日 21:23
收件人: common-user@hadoop.apache.org
主题: MapReduce shuffle question

Team, can someone please clarify the following question?

In the map phase, the map output is written to the local disk. And in the
shuffle phase, the map output partitions are transferred to reduce nodes
using http. So, my question is assuming there are no spills (data set is
small enough to accommodate this), will the map output be transferred
directly from memory to the reduce nodes using http without a disk access
to write the map output? Or, is the map output always flushed to the disk
before transferred to reduce nodes?

Appreciate the help.

Thanks,
Satheesh


答复: 答复: MapReduce shuffle question

Posted by 梁李印 <li...@aliyun-inc.com>.
The optimization you mentioned is reduce-task locality-aware. Unfortunately,
the current scheduler doesn't consider the reduce task's data locality. So a
reduce task can be scheduled to any node with free slots.
The following jira is discussing this problem:
https://issues.apache.org/jira/browse/MAPREDUCE-2038

Liyin Liang
-----邮件原件-----
发件人: Satheesh Kumar [mailto:nkseam@gmail.com] 
发送时间: 2012年8月4日 1:47
收件人: common-user@hadoop.apache.org
主题: Re: 答复: MapReduce shuffle question

Thank you. One more follow up question:

Are there any optimizations to run map and reduces on the same nodes so
that data is not transported across the network? Generally, how often and
what % of map output is actually transferred over the network to reduce
nodes?

Thanks,
Satheesh

On Fri, Aug 3, 2012 at 7:33 AM, 梁李印 <li...@aliyun-inc.com> wrote:

> When a map task is done, its output is always flushed to the disk and
> merged
> to one file.
> The benefit is that if the reducer is failed, the map need not to re-run.
>
> Liyin Liang
>
> -----邮件原件-----
> 发件人: Satheesh Kumar [mailto:nkseam@gmail.com]
> 发送时间: 2012年8月3日 21:23
> 收件人: common-user@hadoop.apache.org
> 主题: MapReduce shuffle question
>
> Team, can someone please clarify the following question?
>
> In the map phase, the map output is written to the local disk. And in the
> shuffle phase, the map output partitions are transferred to reduce nodes
> using http. So, my question is assuming there are no spills (data set is
> small enough to accommodate this), will the map output be transferred
> directly from memory to the reduce nodes using http without a disk access
> to write the map output? Or, is the map output always flushed to the disk
> before transferred to reduce nodes?
>
> Appreciate the help.
>
> Thanks,
> Satheesh
>
>


Re: 答复: 答复: MapReduce shuffle question

Posted by Satheesh Kumar <nk...@gmail.com>.
Thanks, again, Liyin.

On Sat, Aug 4, 2012 at 6:59 AM, 梁李印 <li...@aliyun-inc.com> wrote:

> The optimization you mentioned is reduce-task locality-aware.
> Unfortunately,
> the current scheduler doesn't consider the reduce task's data locality. So
> a
> reduce task can be scheduled to any node with free slots.
> The following jira is discussing this problem:
> https://issues.apache.org/jira/browse/MAPREDUCE-2038
>
> Liyin Liang
> -----邮件原件-----
> 发件人: Satheesh Kumar [mailto:nkseam@gmail.com]
> 发送时间: 2012年8月4日 1:47
> 收件人: common-user@hadoop.apache.org
> 主题: Re: 答复: MapReduce shuffle question
>
> Thank you. One more follow up question:
>
> Are there any optimizations to run map and reduces on the same nodes so
> that data is not transported across the network? Generally, how often and
> what % of map output is actually transferred over the network to reduce
> nodes?
>
> Thanks,
> Satheesh
>
> On Fri, Aug 3, 2012 at 7:33 AM, 梁李印 <li...@aliyun-inc.com> wrote:
>
> > When a map task is done, its output is always flushed to the disk and
> > merged
> > to one file.
> > The benefit is that if the reducer is failed, the map need not to re-run.
> >
> > Liyin Liang
> >
> > -----邮件原件-----
> > 发件人: Satheesh Kumar [mailto:nkseam@gmail.com]
> > 发送时间: 2012年8月3日 21:23
> > 收件人: common-user@hadoop.apache.org
> > 主题: MapReduce shuffle question
> >
> > Team, can someone please clarify the following question?
> >
> > In the map phase, the map output is written to the local disk. And in the
> > shuffle phase, the map output partitions are transferred to reduce nodes
> > using http. So, my question is assuming there are no spills (data set is
> > small enough to accommodate this), will the map output be transferred
> > directly from memory to the reduce nodes using http without a disk access
> > to write the map output? Or, is the map output always flushed to the disk
> > before transferred to reduce nodes?
> >
> > Appreciate the help.
> >
> > Thanks,
> > Satheesh
> >
> >
>
>

Re: 答复: MapReduce shuffle question

Posted by Satheesh Kumar <nk...@gmail.com>.
Thank you. One more follow up question:

Are there any optimizations to run map and reduces on the same nodes so
that data is not transported across the network? Generally, how often and
what % of map output is actually transferred over the network to reduce
nodes?

Thanks,
Satheesh

On Fri, Aug 3, 2012 at 7:33 AM, 梁李印 <li...@aliyun-inc.com> wrote:

> When a map task is done, its output is always flushed to the disk and
> merged
> to one file.
> The benefit is that if the reducer is failed, the map need not to re-run.
>
> Liyin Liang
>
> -----邮件原件-----
> 发件人: Satheesh Kumar [mailto:nkseam@gmail.com]
> 发送时间: 2012年8月3日 21:23
> 收件人: common-user@hadoop.apache.org
> 主题: MapReduce shuffle question
>
> Team, can someone please clarify the following question?
>
> In the map phase, the map output is written to the local disk. And in the
> shuffle phase, the map output partitions are transferred to reduce nodes
> using http. So, my question is assuming there are no spills (data set is
> small enough to accommodate this), will the map output be transferred
> directly from memory to the reduce nodes using http without a disk access
> to write the map output? Or, is the map output always flushed to the disk
> before transferred to reduce nodes?
>
> Appreciate the help.
>
> Thanks,
> Satheesh
>
>

Re: 答复: MapReduce shuffle question

Posted by Satheesh Kumar <nk...@gmail.com>.
Thank you, Liyin,

On Fri, Aug 3, 2012 at 7:33 AM, 梁李印 <li...@aliyun-inc.com> wrote:

> When a map task is done, its output is always flushed to the disk and
> merged
> to one file.
> The benefit is that if the reducer is failed, the map need not to re-run.
>
> Liyin Liang
>
> -----邮件原件-----
> 发件人: Satheesh Kumar [mailto:nkseam@gmail.com]
> 发送时间: 2012年8月3日 21:23
> 收件人: common-user@hadoop.apache.org
> 主题: MapReduce shuffle question
>
> Team, can someone please clarify the following question?
>
> In the map phase, the map output is written to the local disk. And in the
> shuffle phase, the map output partitions are transferred to reduce nodes
> using http. So, my question is assuming there are no spills (data set is
> small enough to accommodate this), will the map output be transferred
> directly from memory to the reduce nodes using http without a disk access
> to write the map output? Or, is the map output always flushed to the disk
> before transferred to reduce nodes?
>
> Appreciate the help.
>
> Thanks,
> Satheesh
>
>