You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by rshepherd <rj...@nyu.edu> on 2012/12/03 19:08:09 UTC

Question about intermediate kv pair files

Hi folks,

Can anyone explain to me briefly how the each mapper reports the
location of the intermediate kv partion files to the master? And, if
possible, where in the code I might find where that happens?

Thanks for any help,
Randy

Re: Question about intermediate kv pair files

Posted by rshepherd <rj...@nyu.edu>.

Thanks Mostafa! Very much appreciated.

On 12/3/12 1:26 PM, Mostafa Elhemali wrote:
> (Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully
> the list will correct any details I get wrong)
>
> In Hadoop 1: the mapper would put the file in a well-known location on the
> machine (encoded by user, job ID and map ID) then TaskTracker would serve
> it over HTTP to the reducer when it requests it (authenticated using a
> secret token in the job). Look in the MapOutputServlet class in TaskTracker
> for most of the related code.
>
> In Yarn: similar thing, except that now it's a NodeManager plug-in
> (auxiliary service) that serves the map output since there's no TaskTracker
> anymore. Look at the ShuffleHandler class in
> hadoop-mapreduce-client-shuffle project. I see comments in the code
> indicating that this will be changed from a NodeManager plug-in in the
> future, but I don't know much about that.
>
> Hope it helps,
> Mostafa
>
>
> On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <rj...@nyu.edu> wrote:
>
>> Hi folks,
>>
>> Can anyone explain to me briefly how the each mapper reports the
>> location of the intermediate kv partion files to the master? And, if
>> possible, where in the code I might find where that happens?
>>
>> Thanks for any help,
>> Randy
>>

Re: Question about intermediate kv pair files

Posted by Mostafa Elhemali <mo...@gmail.com>.

(Disclaimer: Not an expert, but looked at that code quite a bit. Hopefully
the list will correct any details I get wrong)

In Hadoop 1: the mapper would put the file in a well-known location on the
machine (encoded by user, job ID and map ID) then TaskTracker would serve
it over HTTP to the reducer when it requests it (authenticated using a
secret token in the job). Look in the MapOutputServlet class in TaskTracker
for most of the related code.

In Yarn: similar thing, except that now it's a NodeManager plug-in
(auxiliary service) that serves the map output since there's no TaskTracker
anymore. Look at the ShuffleHandler class in
hadoop-mapreduce-client-shuffle project. I see comments in the code
indicating that this will be changed from a NodeManager plug-in in the
future, but I don't know much about that.

Hope it helps,
Mostafa

On Mon, Dec 3, 2012 at 10:08 AM, rshepherd <rj...@nyu.edu> wrote:

> Hi folks,
>
> Can anyone explain to me briefly how the each mapper reports the
> location of the intermediate kv partion files to the master? And, if
> possible, where in the code I might find where that happens?
>
> Thanks for any help,
> Randy
>