You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ling Kun <lk...@gmail.com> on 2013/03/01 11:14:26 UTC

Is there a way to keep all intermediate files there after the MapReduce Job run?

Dear all,
    In order to know more about the files creation and size when the job is
running, I want to keep all the intermediate files there (job.xml,
spillN.out, file.out, file.index, map.out-N, etc).

My question is :
1. Is there any configurations that can make this happen? Or could I modify
some Hadoop MapReduce code for this ?

2. Since each job, each task, and each attempt of the task using different
directories to store all the intermediate files, keeping the files there
without deleting will not hurt the whole MapReduce cluster except taking up
some storage. Am I write?

Thanks

yours,
Ling Kun

-- 
http://www.lingcc.com

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.

JM

2013/3/1 Michael Segel <mi...@hotmail.com>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
>
> You can easily access the job.xml file from the JT webpage.
>
> On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:
>
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
>
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
>
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>
>

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.

JM

2013/3/1 Michael Segel <mi...@hotmail.com>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
>
> You can easily access the job.xml file from the JT webpage.
>
> On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:
>
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
>
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
>
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>
>

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.

JM

2013/3/1 Michael Segel <mi...@hotmail.com>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
>
> You can easily access the job.xml file from the JT webpage.
>
> On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:
>
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
>
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
>
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>
>

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Ling, do you have Hadoop: The Definitive Guide close-by?

I think I remember somewhere they said about keeping the intermediate files.

Take a look at keep.task.files.pattern... It might help you to keep
some of the files you are looking for? Maybe not all... Or even maybe
not any.

JM

2013/3/1 Michael Segel <mi...@hotmail.com>:
> Your job.xml file is kept for a set period of time.
> I believe the others are automatically removed.
>
> You can easily access the job.xml file from the JT webpage.
>
> On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:
>
> Dear all,
>     In order to know more about the files creation and size when the job is
> running, I want to keep all the intermediate files there (job.xml,
> spillN.out, file.out, file.index, map.out-N, etc).
>
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify
> some Hadoop MapReduce code for this ?
>
> 2. Since each job, each task, and each attempt of the task using different
> directories to store all the intermediate files, keeping the files there
> without deleting will not hurt the whole MapReduce cluster except taking up
> some storage. Am I write?
>
> Thanks
>
> yours,
> Ling Kun
>
> --
> http://www.lingcc.com
>
>

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Michael Segel <mi...@hotmail.com>.

Your job.xml file is kept for a set period of time. 
I believe the others are automatically removed. 

You can easily access the job.xml file from the JT webpage.

On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:

> Dear all,
>     In order to know more about the files creation and size when the job is running, I want to keep all the intermediate files there (job.xml, spillN.out, file.out, file.index, map.out-N, etc).
> 
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify some Hadoop MapReduce code for this ?
> 
> 2. Since each job, each task, and each attempt of the task using different  directories to store all the intermediate files, keeping the files there without deleting will not hurt the whole MapReduce cluster except taking up some storage. Am I write?
> 
> Thanks 
> 
> yours,
> Ling Kun
> 
> -- 
> http://www.lingcc.com

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Michael Segel <mi...@hotmail.com>.

Your job.xml file is kept for a set period of time. 
I believe the others are automatically removed. 

You can easily access the job.xml file from the JT webpage.

On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:

> Dear all,
>     In order to know more about the files creation and size when the job is running, I want to keep all the intermediate files there (job.xml, spillN.out, file.out, file.index, map.out-N, etc).
> 
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify some Hadoop MapReduce code for this ?
> 
> 2. Since each job, each task, and each attempt of the task using different  directories to store all the intermediate files, keeping the files there without deleting will not hurt the whole MapReduce cluster except taking up some storage. Am I write?
> 
> Thanks 
> 
> yours,
> Ling Kun
> 
> -- 
> http://www.lingcc.com

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Michael Segel <mi...@hotmail.com>.

Your job.xml file is kept for a set period of time. 
I believe the others are automatically removed. 

You can easily access the job.xml file from the JT webpage.

On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:

> Dear all,
>     In order to know more about the files creation and size when the job is running, I want to keep all the intermediate files there (job.xml, spillN.out, file.out, file.index, map.out-N, etc).
> 
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify some Hadoop MapReduce code for this ?
> 
> 2. Since each job, each task, and each attempt of the task using different  directories to store all the intermediate files, keeping the files there without deleting will not hurt the whole MapReduce cluster except taking up some storage. Am I write?
> 
> Thanks 
> 
> yours,
> Ling Kun
> 
> -- 
> http://www.lingcc.com

Re: Is there a way to keep all intermediate files there after the MapReduce Job run?

Posted by Michael Segel <mi...@hotmail.com>.

Your job.xml file is kept for a set period of time. 
I believe the others are automatically removed. 

You can easily access the job.xml file from the JT webpage.

On Mar 1, 2013, at 4:14 AM, Ling Kun <lk...@gmail.com> wrote:

> Dear all,
>     In order to know more about the files creation and size when the job is running, I want to keep all the intermediate files there (job.xml, spillN.out, file.out, file.index, map.out-N, etc).
> 
> My question is :
> 1. Is there any configurations that can make this happen? Or could I modify some Hadoop MapReduce code for this ?
> 
> 2. Since each job, each task, and each attempt of the task using different  directories to store all the intermediate files, keeping the files there without deleting will not hurt the whole MapReduce cluster except taking up some storage. Am I write?
> 
> Thanks 
> 
> yours,
> Ling Kun
> 
> -- 
> http://www.lingcc.com