You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Panayotis Antonopoulos <an...@hotmail.com> on 2011/05/30 17:32:43 UTC

MultipleOutputs Files remain in temporary folder

Hello,
I just noticed that the files that are created using MultipleOutputs remain in the temporary folder into attempt sub-folders when there is no normal output  (using context.write(...)).

Has anyone else noticed that?
Is there any way to change that and make the files appear in the output directory?

Thank you in advance!
Panagiotis.

Re: MultipleOutputs Files remain in temporary folder

Posted by Harsh J <ha...@cloudera.com>.

Panayotis,

I've not seen this happen yet. I've regularly used MO to write my
files and both TextFileO/F and NullO/F have worked fine despite me not
writing a byte to their collectors. In fact, the test case for MO too
passes when I modify it to never emit to the default output sink.

Are you using the default OutputCommitter (FileOutputCommitter)?

2011/5/30 Panayotis Antonopoulos <an...@hotmail.com>:
> Hello,
> I just noticed that the files that are created using MultipleOutputs remain
> in the temporary folder into attempt sub-folders when there is no normal
> output  (using context.write(...)).
>
> Has anyone else noticed that?
> Is there any way to change that and make the files appear in the output
> directory?
>
> Thank you in advance!
> Panagiotis.
>



-- 
Harsh J

Re: MultipleOutputs Files remain in temporary folder

Posted by Marcos Ortiz <ml...@uci.cu>.

On 05/30/2011 11:02 AM, Panayotis Antonopoulos wrote:
> Hello,
> I just noticed that the files that are created using MultipleOutputs 
> remain in the temporary folder into attempt sub-folders when there is 
> no normal output  (using context.write(...)).
>
> Has anyone else noticed that?
> Is there any way to change that and make the files appear in the 
> output directory?
>
> Thank you in advance!
> Panagiotis.

        |mapred.local.dir|

This lets the MapReduce servers know where to store intermediate files. 
This may be a comma-separated list of directories to spread the load. 
Make sure there’s enough space here for all your intermediate files. We 
share the same disks for MapReduce and HDFS.

        |mapred.system.dir|

This is a folder in the|defaultFS|where MapReduce stores some control 
files. In our case that would be a directory in HDFS. If you 
have|dfs.permissions|(which it is by default) enabled make sure that 
this directory exists and is owned by mapred:hadoop.

        |mapred.temp.dir|

This is a folder to store temporary files in. It is hardly -- if at all 
used. If I understand the description correctly this is supposed to be 
in HDFS but I’m not entirely sure by reading the source code. So we set 
this to a directory that exists on the local filesystem as well as in HDFS.

-- 
Marcos Luis Ortiz Valmaseda
  Software Engineer (Distributed Systems)
  http://uncubanitolinuxero.blogspot.com