You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jianhui Zhang <jh...@gmail.com> on 2012/03/03 01:38:31 UTC

MR output to a file instead of directory?

Hi all,

The FileOutputFormat/FileOutputCommitter always treats an output path
as a directory and write files under it, even if there is only one
Reducer. Is there any way to configure an OutputFormat to write all
data into a file?

Thanks,
James

Re: MR output to a file instead of directory?

Posted by Harsh J <ha...@cloudera.com>.
James,

This is _possible_, but you will need a complete set of both
OutputFormat and OutputCommitter to do the work for you as
File{OutputFormat,OutputCommitter} work with directories. The biggest
advantage of having output directories is the ability to have
temporary attempt directories and output-committing (speculative
execution and task failure handling), described at
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F.
-- You'd need something like this for a complete solution.

On Sat, Mar 3, 2012 at 6:08 AM, Jianhui Zhang <jh...@gmail.com> wrote:
> Hi all,
>
> The FileOutputFormat/FileOutputCommitter always treats an output path
> as a directory and write files under it, even if there is only one
> Reducer. Is there any way to configure an OutputFormat to write all
> data into a file?
>
> Thanks,
> James



-- 
Harsh J

Re: MR output to a file instead of directory?

Posted by Arun C Murthy <ac...@hortonworks.com>.
I'm not sure about the usecase, but if you really care you can use an existing directory (e.g. /) by writing a bit of code to bypass the check for output-dir existence...

By default FIleOutputFormat assumes the output-dir shouldn't exist and will error out during init if it does. You could customize it to not bother to check.

Arun

On Mar 2, 2012, at 4:38 PM, Jianhui Zhang wrote:

> Hi all,
> 
> The FileOutputFormat/FileOutputCommitter always treats an output path
> as a directory and write files under it, even if there is only one
> Reducer. Is there any way to configure an OutputFormat to write all
> data into a file?
> 
> Thanks,
> James

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/