You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Daniel Hoffman <ho...@gmail.com> on 2012/08/23 15:15:09 UTC

Question Regarding FileAlreadyExistsException

With respect to the FileAlreadyExistsException which occurrs when a
duplicate directory is discovered by an OutputFormat,
Is there a hadoop  property that is accessible by the client to disable
this behavior?

IE,  disable.file.already.exists.behaviour=true

Thank You
Daniel G. Hoffman

Re: Question Regarding FileAlreadyExistsException

Posted by Harsh J <ha...@cloudera.com>.
Daniel,

Perhaps you want your OutputFormat set as NullOutputFormat. That does
not carry any checks for output directory pre-existence.

On Thu, Aug 23, 2012 at 9:47 PM, Daniel Hoffman
<ho...@gmail.com> wrote:
> Well, I'm using the MultipleOutputs capability to create a directory
> Structure with Dates.
> So I'm managing this myself.
>
> What I've found, and I could be doing this wrong... is that I still have to
> tell the Tool that I want to use a:
> TextOutputFormat or a FileOutputFormat, and then, have to tell the
> respective formats that I want to use some directory.
>
> IE:
> TextOutputFormat.setOutputDirectory.setOutputDirectory(job,/foo/bar/);
>
> As a work around, I just made a temp directory at /tmp/datetimestamp.
>
> It doesn't make much sense though, sense the reducer uses mulitple output
> formats to make an entirely different directory structure..  Of course, I'm
> probably either not following the M/R Paradigm - or just doing it wrong.
>
> The FilealreadyExistsException was applicable to my "/foo/bar" directory
> which had very little to do with my "genuine" output.
>
>
> Dan
>
> On Thu, Aug 23, 2012 at 9:40 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> I think this specific behavior irritates a lot of new users. We may as
>> well provide a Generic Option to overwrite the output directory if
>> set. That way, we at least help avoid typing a whole delete command.
>> If you agree, please file an improvement request against MAPREDUCE
>> project on the ASF JIRA.
>>
>> On Thu, Aug 23, 2012 at 6:58 PM, Bertrand Dechoux <de...@gmail.com>
>> wrote:
>> > I don't think so. The client is responsible for deleting the resource
>> > before, if it might exist.
>> > Correct me if I am wrong.
>> >
>> > Higher solution (such as Cascading) usually provides a way to define a
>> > strategy to handle it : KEEP, REPLACE, UPDATE ...
>> >
>> http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html
>> >
>> > Regards
>> >
>> > Bertrand
>> >
>> > On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <
>> hoffmandanielg@gmail.com>wrote:
>> >
>> >> With respect to the FileAlreadyExistsException which occurrs when a
>> >> duplicate directory is discovered by an OutputFormat,
>> >> Is there a hadoop  property that is accessible by the client to disable
>> >> this behavior?
>> >>
>> >> IE,  disable.file.already.exists.behaviour=true
>> >>
>> >> Thank You
>> >> Daniel G. Hoffman
>> >>
>> >
>> >
>> >
>> > --
>> > Bertrand Dechoux
>>
>>
>>
>> --
>> Harsh J
>>



-- 
Harsh J

Re: Question Regarding FileAlreadyExistsException

Posted by Daniel Hoffman <ho...@gmail.com>.
Well, I'm using the MultipleOutputs capability to create a directory
Structure with Dates.
So I'm managing this myself.

What I've found, and I could be doing this wrong... is that I still have to
tell the Tool that I want to use a:
TextOutputFormat or a FileOutputFormat, and then, have to tell the
respective formats that I want to use some directory.

IE:
TextOutputFormat.setOutputDirectory.setOutputDirectory(job,/foo/bar/);

As a work around, I just made a temp directory at /tmp/datetimestamp.

It doesn't make much sense though, sense the reducer uses mulitple output
formats to make an entirely different directory structure..  Of course, I'm
probably either not following the M/R Paradigm - or just doing it wrong.

The FilealreadyExistsException was applicable to my "/foo/bar" directory
which had very little to do with my "genuine" output.


Dan

On Thu, Aug 23, 2012 at 9:40 AM, Harsh J <ha...@cloudera.com> wrote:

> I think this specific behavior irritates a lot of new users. We may as
> well provide a Generic Option to overwrite the output directory if
> set. That way, we at least help avoid typing a whole delete command.
> If you agree, please file an improvement request against MAPREDUCE
> project on the ASF JIRA.
>
> On Thu, Aug 23, 2012 at 6:58 PM, Bertrand Dechoux <de...@gmail.com>
> wrote:
> > I don't think so. The client is responsible for deleting the resource
> > before, if it might exist.
> > Correct me if I am wrong.
> >
> > Higher solution (such as Cascading) usually provides a way to define a
> > strategy to handle it : KEEP, REPLACE, UPDATE ...
> >
> http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html
> >
> > Regards
> >
> > Bertrand
> >
> > On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <
> hoffmandanielg@gmail.com>wrote:
> >
> >> With respect to the FileAlreadyExistsException which occurrs when a
> >> duplicate directory is discovered by an OutputFormat,
> >> Is there a hadoop  property that is accessible by the client to disable
> >> this behavior?
> >>
> >> IE,  disable.file.already.exists.behaviour=true
> >>
> >> Thank You
> >> Daniel G. Hoffman
> >>
> >
> >
> >
> > --
> > Bertrand Dechoux
>
>
>
> --
> Harsh J
>

Re: Question Regarding FileAlreadyExistsException

Posted by Harsh J <ha...@cloudera.com>.
I think this specific behavior irritates a lot of new users. We may as
well provide a Generic Option to overwrite the output directory if
set. That way, we at least help avoid typing a whole delete command.
If you agree, please file an improvement request against MAPREDUCE
project on the ASF JIRA.

On Thu, Aug 23, 2012 at 6:58 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> I don't think so. The client is responsible for deleting the resource
> before, if it might exist.
> Correct me if I am wrong.
>
> Higher solution (such as Cascading) usually provides a way to define a
> strategy to handle it : KEEP, REPLACE, UPDATE ...
> http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html
>
> Regards
>
> Bertrand
>
> On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <ho...@gmail.com>wrote:
>
>> With respect to the FileAlreadyExistsException which occurrs when a
>> duplicate directory is discovered by an OutputFormat,
>> Is there a hadoop  property that is accessible by the client to disable
>> this behavior?
>>
>> IE,  disable.file.already.exists.behaviour=true
>>
>> Thank You
>> Daniel G. Hoffman
>>
>
>
>
> --
> Bertrand Dechoux



-- 
Harsh J

Re: Question Regarding FileAlreadyExistsException

Posted by Bertrand Dechoux <de...@gmail.com>.
I don't think so. The client is responsible for deleting the resource
before, if it might exist.
Correct me if I am wrong.

Higher solution (such as Cascading) usually provides a way to define a
strategy to handle it : KEEP, REPLACE, UPDATE ...
http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html

Regards

Bertrand

On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <ho...@gmail.com>wrote:

> With respect to the FileAlreadyExistsException which occurrs when a
> duplicate directory is discovered by an OutputFormat,
> Is there a hadoop  property that is accessible by the client to disable
> this behavior?
>
> IE,  disable.file.already.exists.behaviour=true
>
> Thank You
> Daniel G. Hoffman
>



-- 
Bertrand Dechoux