You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2012/04/02 15:39:54 UTC

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

Hi guys:

I have a map reduce job that runs normally on local file system from
eclipse, *but* it fails on HDFS running in psuedo distributed mode.

The exception I see is

*org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*


Any thoughts on why this might occur in psuedo distributed mode, but not in
regular file system ?

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

Posted by Jay Vyas <ja...@gmail.com>.
Thanks J : just curious about how you came to hypothesize (1) (i.e.
regarding the fact that threads and the
API componentns arent thread safe in my hadoop version).

I think thats a really good guess, and I would like to be able to make
those sorts of intelligent hypotheses
myself.  Any reading you can point me to for further enlightement ?

On Mon, Apr 2, 2012 at 3:16 PM, Harsh J <ha...@cloudera.com> wrote:

> Jay,
>
> Without seeing the whole stack trace all I can say as cause for that
> exception from a job is:
>
> 1. You're using threads and the API components you are using isn't
> thread safe in your version of Hadoop.
> 2. Files are being written out to HDFS directories without following
> the OC rules. (This is negated, per your response).
>
> On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <ja...@gmail.com> wrote:
> > No, my job does not write files directly to disk. It simply goes to some
> > web pages , reads data (in the reducer phase), and parses jsons into
> thrift
> > objects which are emitted via the standard MultipleOutputs API to hdfs
> > files.
> >
> > Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
> >
> > On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote:
> >
> >> Jay,
> >>
> >> What does your job do? Create files directly on HDFS? If so, do you
> >> follow this method?:
> >>
> >>
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
> >>
> >> A local filesystem may not complain if you re-create an existent file.
> >> HDFS' behavior here is different. This simple Python test is what I
> >> mean:
> >> >>> a = open('a', 'w')
> >> >>> a.write('f')
> >> >>> b = open('a', 'w')
> >> >>> b.write('s')
> >> >>> a.close(), b.close()
> >> >>> open('a').read()
> >> 's'
> >>
> >> Hence it is best to use the FileOutputCommitter framework as detailed
> >> in the mentioned link.
> >>
> >> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <ja...@gmail.com> wrote:
> >> > Hi guys:
> >> >
> >> > I have a map reduce job that runs normally on local file system from
> >> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
> >> >
> >> > The exception I see is
> >> >
> >> > *org.apache.hadoop.ipc.RemoteException:
> >> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
> >> >
> >> >
> >> > Any thoughts on why this might occur in psuedo distributed mode, but
> not
> >> in
> >> > regular file system ?
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
>
>
>
> --
> Harsh J
>



-- 
Jay Vyas
MMSB/UCHC

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

Posted by Harsh J <ha...@cloudera.com>.
Jay,

Without seeing the whole stack trace all I can say as cause for that
exception from a job is:

1. You're using threads and the API components you are using isn't
thread safe in your version of Hadoop.
2. Files are being written out to HDFS directories without following
the OC rules. (This is negated, per your response).

On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <ja...@gmail.com> wrote:
> No, my job does not write files directly to disk. It simply goes to some
> web pages , reads data (in the reducer phase), and parses jsons into thrift
> objects which are emitted via the standard MultipleOutputs API to hdfs
> files.
>
> Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
>
> On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Jay,
>>
>> What does your job do? Create files directly on HDFS? If so, do you
>> follow this method?:
>>
>> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>>
>> A local filesystem may not complain if you re-create an existent file.
>> HDFS' behavior here is different. This simple Python test is what I
>> mean:
>> >>> a = open('a', 'w')
>> >>> a.write('f')
>> >>> b = open('a', 'w')
>> >>> b.write('s')
>> >>> a.close(), b.close()
>> >>> open('a').read()
>> 's'
>>
>> Hence it is best to use the FileOutputCommitter framework as detailed
>> in the mentioned link.
>>
>> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <ja...@gmail.com> wrote:
>> > Hi guys:
>> >
>> > I have a map reduce job that runs normally on local file system from
>> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
>> >
>> > The exception I see is
>> >
>> > *org.apache.hadoop.ipc.RemoteException:
>> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
>> >
>> >
>> > Any thoughts on why this might occur in psuedo distributed mode, but not
>> in
>> > regular file system ?
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Jay Vyas
> MMSB/UCHC



-- 
Harsh J

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

Posted by Jay Vyas <ja...@gmail.com>.
No, my job does not write files directly to disk. It simply goes to some
web pages , reads data (in the reducer phase), and parses jsons into thrift
objects which are emitted via the standard MultipleOutputs API to hdfs
files.

Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?

On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <ha...@cloudera.com> wrote:

> Jay,
>
> What does your job do? Create files directly on HDFS? If so, do you
> follow this method?:
>
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>
> A local filesystem may not complain if you re-create an existent file.
> HDFS' behavior here is different. This simple Python test is what I
> mean:
> >>> a = open('a', 'w')
> >>> a.write('f')
> >>> b = open('a', 'w')
> >>> b.write('s')
> >>> a.close(), b.close()
> >>> open('a').read()
> 's'
>
> Hence it is best to use the FileOutputCommitter framework as detailed
> in the mentioned link.
>
> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <ja...@gmail.com> wrote:
> > Hi guys:
> >
> > I have a map reduce job that runs normally on local file system from
> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
> >
> > The exception I see is
> >
> > *org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
> >
> >
> > Any thoughts on why this might occur in psuedo distributed mode, but not
> in
> > regular file system ?
>
>
>
> --
> Harsh J
>



-- 
Jay Vyas
MMSB/UCHC

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

Posted by Harsh J <ha...@cloudera.com>.
Jay,

What does your job do? Create files directly on HDFS? If so, do you
follow this method?:
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F

A local filesystem may not complain if you re-create an existent file.
HDFS' behavior here is different. This simple Python test is what I
mean:
>>> a = open('a', 'w')
>>> a.write('f')
>>> b = open('a', 'w')
>>> b.write('s')
>>> a.close(), b.close()
>>> open('a').read()
's'

Hence it is best to use the FileOutputCommitter framework as detailed
in the mentioned link.

On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <ja...@gmail.com> wrote:
> Hi guys:
>
> I have a map reduce job that runs normally on local file system from
> eclipse, *but* it fails on HDFS running in psuedo distributed mode.
>
> The exception I see is
>
> *org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
>
>
> Any thoughts on why this might occur in psuedo distributed mode, but not in
> regular file system ?



-- 
Harsh J