You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Starry SHI <st...@gmail.com> on 2009/09/26 08:34:43 UTC

Where are temp files stored?

Hi.

I am wondering where the temp files (intermediate files) are stored. They
should be located in the hadoop.tmp.dir by default, right? why I cannot find
them in either the local file system and hdfs?

I was doing a two table join using hadoop. before the job is completed, the
intermidiate files should be stored in the tmp folder, however, I cannot
find the trace of them. Can somebody tell me how to get access to the
intermediate files in hadoop?

Another question is about the replication of the intermediate files. By
default, will the intermediate (tmp) files be written to HDFS? If yes, will
they have replicas? I am thinking if the tmp files also have replica, they
should cause a great overhead on the performance. Is there a way to specify
which files should have replica and which need not?

Looking forward to your reply!

Best regards,
Starry

/* Tomorrow is another day. So is today. */

is that a mistake in Hadoop Tutorial?

Posted by springring <sp...@126.com>.
Hi,
    as the red color word in attached file page7. i think it should be "combine" instead of "map",
or it's my miscommunication?

br

Springring.Xu

Re: Where are temp files stored?

Posted by Todd Lipcon <to...@cloudera.com>.
On Sun, Sep 27, 2009 at 7:39 PM, Starry SHI <st...@gmail.com> wrote:

> Hi Dave.
>
> Thank you for your reply!
>
> I have checked {dfs.data.dir}/tmp, the tmp files are there while the job is
> running. However, it seems that the tmp files on each node are the same.
> That is to say, the whole HDFS is sharing the same tmp files. This looks
> strange, because each node shou
>
ld process its own part of data. Do you have
> some ideas on this point?
>

The mapreduce intermediate data is stored in mapred.local.dir. The default
value for this is hadoop.tmp.dir/mapred/local. Note that it is cleaned up
after jobs finish executing.

-Todd


> /* Tomorrow is another day. So is today. */
>
>
> On Sat, Sep 26, 2009 at 15:07, dave bayer <da...@cloudfactory.org> wrote:
>
> >
> > On Sep 25, 2009, at 11:34 PM, Starry SHI wrote:
> >
> >  Hi.
> >>
> >> I am wondering where the temp files (intermediate files) are stored.
> They
> >> should be located in the hadoop.tmp.dir by default, right? why I cannot
> >> find
> >> them in either the local file system and hdfs?
> >>
> >
> > You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted the
> > code to verify that is how the path is built, but that is where I've seen
> > them on my cluster...
> >
> >  Another question is about the replication of the intermediate files. By
> >> default, will the intermediate (tmp) files be written to HDFS?
> >>
> >
> > No, they live on the node that processed the map task. You wouldn't
> > want to spend the cycles/time to do multiple replication of this data out
> > to other nodes (and then cleanup it up) when you can rerun the task if
> > the node holding the data happens to go down (unlikely).
> >
> > dave bayer
> >
>

Re: Where are temp files stored?

Posted by Starry SHI <st...@gmail.com>.
Hi Dave.

Thank you for your reply!

I have checked {dfs.data.dir}/tmp, the tmp files are there while the job is
running. However, it seems that the tmp files on each node are the same.
That is to say, the whole HDFS is sharing the same tmp files. This looks
strange, because each node should process its own part of data. Do you have
some ideas on this point?

Best regards,
Starry

/* Tomorrow is another day. So is today. */


On Sat, Sep 26, 2009 at 15:07, dave bayer <da...@cloudfactory.org> wrote:

>
> On Sep 25, 2009, at 11:34 PM, Starry SHI wrote:
>
>  Hi.
>>
>> I am wondering where the temp files (intermediate files) are stored. They
>> should be located in the hadoop.tmp.dir by default, right? why I cannot
>> find
>> them in either the local file system and hdfs?
>>
>
> You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted the
> code to verify that is how the path is built, but that is where I've seen
> them on my cluster...
>
>  Another question is about the replication of the intermediate files. By
>> default, will the intermediate (tmp) files be written to HDFS?
>>
>
> No, they live on the node that processed the map task. You wouldn't
> want to spend the cycles/time to do multiple replication of this data out
> to other nodes (and then cleanup it up) when you can rerun the task if
> the node holding the data happens to go down (unlikely).
>
> dave bayer
>

Re: Where are temp files stored?

Posted by dave bayer <da...@cloudfactory.org>.
On Sep 25, 2009, at 11:34 PM, Starry SHI wrote:

> Hi.
>
> I am wondering where the temp files (intermediate files) are stored.  
> They
> should be located in the hadoop.tmp.dir by default, right? why I  
> cannot find
> them in either the local file system and hdfs?

You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted  
the
code to verify that is how the path is built, but that is where I've  
seen
them on my cluster...

> Another question is about the replication of the intermediate files.  
> By
> default, will the intermediate (tmp) files be written to HDFS?

No, they live on the node that processed the map task. You wouldn't
want to spend the cycles/time to do multiple replication of this data  
out
to other nodes (and then cleanup it up) when you can rerun the task if
the node holding the data happens to go down (unlikely).

dave bayer