You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ling Kun <lk...@gmail.com> on 2013/05/10 05:19:29 UTC

When and who move the reduce output file part-0000X to the final output directory

Dear all,

     I am looking into the MR work flow, and want to know more details
about the reduce output data copy .

    Here is my question.

   For the DFSIO test or some other MR jobs. Each reduce task will run on a
TT, and generate files to some dirs named like this:  "
XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
a result file named part-00000.

  After the reducer done the task. the reducer output data part-00000
should be moved from  the local disk to the HDFS.

My question is: Is that the time that when reducer finish the task that
part-00000 will be copied to the HDFS? Who make this file copy happen? The
Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?

Thanks,

yours,
Kun Ling

-- 
http://www.lingcc.com

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Ling Kun <lk...@gmail.com>.
Thanks Harsh!
your reply helps me a lot.

Kun Ling


On Fri, May 10, 2013 at 1:26 PM, Harsh J <ha...@cloudera.com> wrote:

> The task itself moves it when it receives a commitTask message. See
> the OutputCommitter class:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)
>
> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> > Dear all,
> >
> >      I am looking into the MR work flow, and want to know more details
> about
> > the reduce output data copy .
> >
> >     Here is my question.
> >
> >    For the DFSIO test or some other MR jobs. Each reduce task will run
> on a
> > TT, and generate files to some dirs named like this:  "
> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also
> be
> > a result file named part-00000.
> >
> >   After the reducer done the task. the reducer output data part-00000
> should
> > be moved from  the local disk to the HDFS.
> >
> > My question is: Is that the time that when reducer finish the task that
> > part-00000 will be copied to the HDFS? Who make this file copy happen?
> The
> > Reducer child? The TaskTracker which run the reduce task? Or the
> JobTracker?
> >
> > Thanks,
> >
> > yours,
> > Kun Ling
> >
> > --
> > http://www.lingcc.com
>
>
>
> --
> Harsh J
>



-- 
http://www.lingcc.com

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Ling Kun <lk...@gmail.com>.
Thanks Harsh!
your reply helps me a lot.

Kun Ling


On Fri, May 10, 2013 at 1:26 PM, Harsh J <ha...@cloudera.com> wrote:

> The task itself moves it when it receives a commitTask message. See
> the OutputCommitter class:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)
>
> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> > Dear all,
> >
> >      I am looking into the MR work flow, and want to know more details
> about
> > the reduce output data copy .
> >
> >     Here is my question.
> >
> >    For the DFSIO test or some other MR jobs. Each reduce task will run
> on a
> > TT, and generate files to some dirs named like this:  "
> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also
> be
> > a result file named part-00000.
> >
> >   After the reducer done the task. the reducer output data part-00000
> should
> > be moved from  the local disk to the HDFS.
> >
> > My question is: Is that the time that when reducer finish the task that
> > part-00000 will be copied to the HDFS? Who make this file copy happen?
> The
> > Reducer child? The TaskTracker which run the reduce task? Or the
> JobTracker?
> >
> > Thanks,
> >
> > yours,
> > Kun Ling
> >
> > --
> > http://www.lingcc.com
>
>
>
> --
> Harsh J
>



-- 
http://www.lingcc.com

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Ling Kun <lk...@gmail.com>.
Thanks Harsh!
your reply helps me a lot.

Kun Ling


On Fri, May 10, 2013 at 1:26 PM, Harsh J <ha...@cloudera.com> wrote:

> The task itself moves it when it receives a commitTask message. See
> the OutputCommitter class:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)
>
> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> > Dear all,
> >
> >      I am looking into the MR work flow, and want to know more details
> about
> > the reduce output data copy .
> >
> >     Here is my question.
> >
> >    For the DFSIO test or some other MR jobs. Each reduce task will run
> on a
> > TT, and generate files to some dirs named like this:  "
> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also
> be
> > a result file named part-00000.
> >
> >   After the reducer done the task. the reducer output data part-00000
> should
> > be moved from  the local disk to the HDFS.
> >
> > My question is: Is that the time that when reducer finish the task that
> > part-00000 will be copied to the HDFS? Who make this file copy happen?
> The
> > Reducer child? The TaskTracker which run the reduce task? Or the
> JobTracker?
> >
> > Thanks,
> >
> > yours,
> > Kun Ling
> >
> > --
> > http://www.lingcc.com
>
>
>
> --
> Harsh J
>



-- 
http://www.lingcc.com

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Ling Kun <lk...@gmail.com>.
Thanks Harsh!
your reply helps me a lot.

Kun Ling


On Fri, May 10, 2013 at 1:26 PM, Harsh J <ha...@cloudera.com> wrote:

> The task itself moves it when it receives a commitTask message. See
> the OutputCommitter class:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)
>
> On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> > Dear all,
> >
> >      I am looking into the MR work flow, and want to know more details
> about
> > the reduce output data copy .
> >
> >     Here is my question.
> >
> >    For the DFSIO test or some other MR jobs. Each reduce task will run
> on a
> > TT, and generate files to some dirs named like this:  "
> > XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also
> be
> > a result file named part-00000.
> >
> >   After the reducer done the task. the reducer output data part-00000
> should
> > be moved from  the local disk to the HDFS.
> >
> > My question is: Is that the time that when reducer finish the task that
> > part-00000 will be copied to the HDFS? Who make this file copy happen?
> The
> > Reducer child? The TaskTracker which run the reduce task? Or the
> JobTracker?
> >
> > Thanks,
> >
> > yours,
> > Kun Ling
> >
> > --
> > http://www.lingcc.com
>
>
>
> --
> Harsh J
>



-- 
http://www.lingcc.com

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Harsh J <ha...@cloudera.com>.
The task itself moves it when it receives a commitTask message. See
the OutputCommitter class:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)

On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> Dear all,
>
>      I am looking into the MR work flow, and want to know more details about
> the reduce output data copy .
>
>     Here is my question.
>
>    For the DFSIO test or some other MR jobs. Each reduce task will run on a
> TT, and generate files to some dirs named like this:  "
> XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
> a result file named part-00000.
>
>   After the reducer done the task. the reducer output data part-00000 should
> be moved from  the local disk to the HDFS.
>
> My question is: Is that the time that when reducer finish the task that
> part-00000 will be copied to the HDFS? Who make this file copy happen? The
> Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?
>
> Thanks,
>
> yours,
> Kun Ling
>
> --
> http://www.lingcc.com



-- 
Harsh J

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Harsh J <ha...@cloudera.com>.
The task itself moves it when it receives a commitTask message. See
the OutputCommitter class:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)

On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> Dear all,
>
>      I am looking into the MR work flow, and want to know more details about
> the reduce output data copy .
>
>     Here is my question.
>
>    For the DFSIO test or some other MR jobs. Each reduce task will run on a
> TT, and generate files to some dirs named like this:  "
> XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
> a result file named part-00000.
>
>   After the reducer done the task. the reducer output data part-00000 should
> be moved from  the local disk to the HDFS.
>
> My question is: Is that the time that when reducer finish the task that
> part-00000 will be copied to the HDFS? Who make this file copy happen? The
> Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?
>
> Thanks,
>
> yours,
> Kun Ling
>
> --
> http://www.lingcc.com



-- 
Harsh J

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Harsh J <ha...@cloudera.com>.
The task itself moves it when it receives a commitTask message. See
the OutputCommitter class:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)

On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> Dear all,
>
>      I am looking into the MR work flow, and want to know more details about
> the reduce output data copy .
>
>     Here is my question.
>
>    For the DFSIO test or some other MR jobs. Each reduce task will run on a
> TT, and generate files to some dirs named like this:  "
> XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
> a result file named part-00000.
>
>   After the reducer done the task. the reducer output data part-00000 should
> be moved from  the local disk to the HDFS.
>
> My question is: Is that the time that when reducer finish the task that
> part-00000 will be copied to the HDFS? Who make this file copy happen? The
> Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?
>
> Thanks,
>
> yours,
> Kun Ling
>
> --
> http://www.lingcc.com



-- 
Harsh J

Re: When and who move the reduce output file part-0000X to the final output directory

Posted by Harsh J <ha...@cloudera.com>.
The task itself moves it when it receives a commitTask message. See
the OutputCommitter class:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/OutputCommitter.html#commitTask(org.apache.hadoop.mapred.TaskAttemptContext)

On Fri, May 10, 2013 at 8:49 AM, Ling Kun <lk...@gmail.com> wrote:
> Dear all,
>
>      I am looking into the MR work flow, and want to know more details about
> the reduce output data copy .
>
>     Here is my question.
>
>    For the DFSIO test or some other MR jobs. Each reduce task will run on a
> TT, and generate files to some dirs named like this:  "
> XXX//_temporary/_attempt_201305101045_0005_r_000000_0/", there will also be
> a result file named part-00000.
>
>   After the reducer done the task. the reducer output data part-00000 should
> be moved from  the local disk to the HDFS.
>
> My question is: Is that the time that when reducer finish the task that
> part-00000 will be copied to the HDFS? Who make this file copy happen? The
> Reducer child? The TaskTracker which run the reduce task? Or the JobTracker?
>
> Thanks,
>
> yours,
> Kun Ling
>
> --
> http://www.lingcc.com



-- 
Harsh J