You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by some speed <sp...@gmail.com> on 2009/03/31 06:25:35 UTC

skip setting output path for a sequential MR job..

Hello everyone,

Is it necessary to redirect the ouput of reduce to a file? When I am trying
to run the same M-R job more than once, it throws an error that the output
file already exists. I dont want to use command line args so I hard coded
the file name into the program.

So, Is there a way , I could delete a file on HDFS programatically?
or can i skip setting a output file path n just have my output print to
console?
or can I just append to an existing file?


Any help is appreciated. Thanks.

-Sharath

Re: skip setting output path for a sequential MR job..

Posted by Aaron Kimball <aa...@cloudera.com>.
You must remove the existing output directory before running the job. This
check is put in to prevent you from inadvertently destroying or muddling
your existing output data.

You can remove the output directory in advance programmatically with code
similar to:

FileSystem fs = FileSystem.get(conf); // use your JobConf here
fs.delete(new Path("/path/to/output/dir"), true);

See
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.htmlfor
more details.

- Aaron


On Mon, Mar 30, 2009 at 9:25 PM, some speed <sp...@gmail.com> wrote:

> Hello everyone,
>
> Is it necessary to redirect the ouput of reduce to a file? When I am trying
> to run the same M-R job more than once, it throws an error that the output
> file already exists. I dont want to use command line args so I hard coded
> the file name into the program.
>
> So, Is there a way , I could delete a file on HDFS programatically?
> or can i skip setting a output file path n just have my output print to
> console?
> or can I just append to an existing file?
>
>
> Any help is appreciated. Thanks.
>
> -Sharath
>

Re: skip setting output path for a sequential MR job..

Posted by some speed <sp...@gmail.com>.
Removing the file programatically is doing the trick for me. thank you all
for your answers and help :-)

On Tue, Mar 31, 2009 at 12:25 AM, some speed <sp...@gmail.com> wrote:

> Hello everyone,
>
> Is it necessary to redirect the ouput of reduce to a file? When I am trying
> to run the same M-R job more than once, it throws an error that the output
> file already exists. I dont want to use command line args so I hard coded
> the file name into the program.
>
> So, Is there a way , I could delete a file on HDFS programatically?
> or can i skip setting a output file path n just have my output print to
> console?
> or can I just append to an existing file?
>
>
> Any help is appreciated. Thanks.
>
> -Sharath
>

Re: skip setting output path for a sequential MR job..

Posted by Owen O'Malley <om...@apache.org>.
On Mar 30, 2009, at 9:25 PM, some speed wrote:

> So, Is there a way , I could delete a file on HDFS programatically?
> or can i skip setting a output file path n just have my output print  
> to
> console?
> or can I just append to an existing file?

I wouldn't suggest using append yet. If you really just want side- 
effects from a job, you can use the NullOutputFormat that just ignores  
the output and throws it away.

If you want it to come back out to the launching program, you could  
just print it to stderr in the task and set  
JobClient.setTaskOutputFilter to SUCCEEDED and the output will be  
printed. (Don't try this at home on a real cluster, or your client  
will be swamped!)

-- Owen