You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Panayotis Antonopoulos <an...@hotmail.com> on 2011/05/30 17:32:43 UTC
MultipleOutputs Files remain in temporary folder
Hello,
I just noticed that the files that are created using MultipleOutputs remain in the temporary folder into attempt sub-folders when there is no normal output (using context.write(...)).
Has anyone else noticed that?
Is there any way to change that and make the files appear in the output directory?
Thank you in advance!
Panagiotis.
Re: MultipleOutputs Files remain in temporary folder
Posted by Harsh J <ha...@cloudera.com>.
Panayotis,
I've not seen this happen yet. I've regularly used MO to write my
files and both TextFileO/F and NullO/F have worked fine despite me not
writing a byte to their collectors. In fact, the test case for MO too
passes when I modify it to never emit to the default output sink.
Are you using the default OutputCommitter (FileOutputCommitter)?
2011/5/30 Panayotis Antonopoulos <an...@hotmail.com>:
> Hello,
> I just noticed that the files that are created using MultipleOutputs remain
> in the temporary folder into attempt sub-folders when there is no normal
> output (using context.write(...)).
>
> Has anyone else noticed that?
> Is there any way to change that and make the files appear in the output
> directory?
>
> Thank you in advance!
> Panagiotis.
>
--
Harsh J
Re: MultipleOutputs Files remain in temporary folder
Posted by Marcos Ortiz <ml...@uci.cu>.
On 05/30/2011 11:02 AM, Panayotis Antonopoulos wrote:
> Hello,
> I just noticed that the files that are created using MultipleOutputs
> remain in the temporary folder into attempt sub-folders when there is
> no normal output (using context.write(...)).
>
> Has anyone else noticed that?
> Is there any way to change that and make the files appear in the
> output directory?
>
> Thank you in advance!
> Panagiotis.
|mapred.local.dir|
This lets the MapReduce servers know where to store intermediate files.
This may be a comma-separated list of directories to spread the load.
Make sure there’s enough space here for all your intermediate files. We
share the same disks for MapReduce and HDFS.
|mapred.system.dir|
This is a folder in the|defaultFS|where MapReduce stores some control
files. In our case that would be a directory in HDFS. If you
have|dfs.permissions|(which it is by default) enabled make sure that
this directory exists and is owned by mapred:hadoop.
|mapred.temp.dir|
This is a folder to store temporary files in. It is hardly -- if at all
used. If I understand the description correctly this is supposed to be
in HDFS but I’m not entirely sure by reading the source code. So we set
this to a directory that exists on the local filesystem as well as in HDFS.
--
Marcos Luis Ortiz Valmaseda
Software Engineer (Distributed Systems)
http://uncubanitolinuxero.blogspot.com