You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Rahul.V." <gr...@gmail.com> on 2010/08/17 12:47:34 UTC

code for finding where the map outputs are transferred to file.

Hi,
Ive read that the intermediate map output is written to the disk at the
regular intervals. Infact Ive read that there are background threads which
spill the data onto disk whenever it crosses the threshold.[Source:Hadoop:
The Definitive Guide.]
Ive tried to dig into the code a couple of times to see where exactly this
is happening. If any of you know where is it, can you kindly let me know the
filename and package name where I can find it?

-- 
Regards,
R.V.

Re: code for finding where the map outputs are transferred to file.

Posted by Vinod KV <vi...@yahoo-inc.com>.

Moving mapreduce specific question to mapreduce-user@hadoop.apache.org

All map task related execution starts at org.apache.hadoop.mapred.MapTask.

For your specific question, you can see MapTask.runNewMapper() - > 
NewOutputCollector -> MapOutputBuffer.

HTH,
+vinod


On Tuesday 17 August 2010 04:17 PM, Rahul.V. wrote:
> Hi,
> Ive read that the intermediate map output is written to the disk at the
> regular intervals. Infact Ive read that there are background threads which
> spill the data onto disk whenever it crosses the threshold.[Source:Hadoop:
> The Definitive Guide.]
> Ive tried to dig into the code a couple of times to see where exactly this
> is happening. If any of you know where is it, can you kindly let me know the
> filename and package name where I can find it?
>
>