You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "inifok.song" <ha...@gmail.com> on 2009/08/27 05:04:16 UTC

How does reducer get intermediate output?

Hi all,

In my cluster, the reducer often can't fetch mapper's output. I know there
are many reasons for this situation. And I think it's necessary to find out
how does reducer get intermediate output. I have read the source code.
However, I'm not clear about the whole process. Could you tell me the
process of it? How does each node communicate with each other and how does
class ReduceCopier work?

Thank you.

Inifok

Re: How does reducer get intermediate output?

Posted by Inifok Song <ha...@gmail.com>.
Hello Harish,

I find taskLogUrl.openConnection() often cause IOException. And I suspect
that the connection pool is too small. Could you tell me how can I get
settings of jetty for hadoop?

Thank you.

Inifok

2009/8/27 Harish Mallipeddi <ha...@gmail.com>

> On Thu, Aug 27, 2009 at 8:34 AM, inifok.song <hadoop.inifok@gmail.com
> >wrote:
>
> > Hi all,
> >
> > In my cluster, the reducer often can't fetch mapper's output. I know
> there
> > are many reasons for this situation. And I think it's necessary to find
> out
> > how does reducer get intermediate output. I have read the source code.
> > However, I'm not clear about the whole process. Could you tell me the
> > process of it? How does each node communicate with each other and how
> does
> > class ReduceCopier work?
> >
> > Thank you.
> >
> > Inifok
> >
>
> Each TaskTracker runs a Jetty webserver which is responsible for serving
> requests for intermediate map-outputs. The ReduceTask process receives
> notifications regarding completed MapTasks from its TaskTracker (which in
> turn receives that info from the JobTracker). Once it receives these
> notifications, the ReduceTask will start fetching these map-outputs via
> HTTP
> by requesting the corresponding TT's Jetty webserver.
>
> --
> Harish Mallipeddi
> http://blog.poundbang.in
>

Re: How does reducer get intermediate output?

Posted by Harish Mallipeddi <ha...@gmail.com>.
On Thu, Aug 27, 2009 at 8:34 AM, inifok.song <ha...@gmail.com>wrote:

> Hi all,
>
> In my cluster, the reducer often can't fetch mapper's output. I know there
> are many reasons for this situation. And I think it's necessary to find out
> how does reducer get intermediate output. I have read the source code.
> However, I'm not clear about the whole process. Could you tell me the
> process of it? How does each node communicate with each other and how does
> class ReduceCopier work?
>
> Thank you.
>
> Inifok
>

Each TaskTracker runs a Jetty webserver which is responsible for serving
requests for intermediate map-outputs. The ReduceTask process receives
notifications regarding completed MapTasks from its TaskTracker (which in
turn receives that info from the JobTracker). Once it receives these
notifications, the ReduceTask will start fetching these map-outputs via HTTP
by requesting the corresponding TT's Jetty webserver.

-- 
Harish Mallipeddi
http://blog.poundbang.in