You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Radim Kolar <hs...@filez.com> on 2012/09/13 00:51:37 UTC

multipleoutputs does not like speculative execution in map-only job

with speculative execution enabled Hadoop can run task attempt on more 
then 1 node. If mapper is using multipleoutputs then second attempt (or 
sometimes even all) fails to create output file because it is being 
created by another attempt:

attempt_1347286420691_0011_m_000000_0
attempt_1347286420691_0011_m_000000_1
..
fails with
Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: 
failed to create file /cznewgen/segments/20120907190053/parse_db/-m-00000

in my code i am using mos.write with 4 arguments. this problem is 
discussed in javadoc for FileOutputFormat function getWorkOutputPath, 
its possible to change MultipleOutputs to take advantage of this function?

or its better to change FileOoutputFormat.getUniqueFile() to append last 
digit in attempt id to filename to create unique names such as 
/cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hold on. I do not see a _temporary/attemptID path in the path the
error reports? Is MO really doing this or are you getting the filename
manually from something? With MO, MO builds the file paths on its own,
and there's no need to use uniquepath calls or the like.

Sorry I didn't notice this carefully before. If you can share a
reproducible test-case job, that'd be of great help.

On Fri, Sep 14, 2012 at 8:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> In 0.23 and branch-2 there were a lot of changes that went into the
> FileOutputFormat to be able to allow for AppMaster recovery. It is very
> likely that this is a regression from the 1.0 line.  Do you know if this
> works on 1.0?
>
> On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:
>
>>
>>> What version of Hadoop is this on?
>>branch-0.23
>



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hold on. I do not see a _temporary/attemptID path in the path the
error reports? Is MO really doing this or are you getting the filename
manually from something? With MO, MO builds the file paths on its own,
and there's no need to use uniquepath calls or the like.

Sorry I didn't notice this carefully before. If you can share a
reproducible test-case job, that'd be of great help.

On Fri, Sep 14, 2012 at 8:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> In 0.23 and branch-2 there were a lot of changes that went into the
> FileOutputFormat to be able to allow for AppMaster recovery. It is very
> likely that this is a regression from the 1.0 line.  Do you know if this
> works on 1.0?
>
> On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:
>
>>
>>> What version of Hadoop is this on?
>>branch-0.23
>



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hold on. I do not see a _temporary/attemptID path in the path the
error reports? Is MO really doing this or are you getting the filename
manually from something? With MO, MO builds the file paths on its own,
and there's no need to use uniquepath calls or the like.

Sorry I didn't notice this carefully before. If you can share a
reproducible test-case job, that'd be of great help.

On Fri, Sep 14, 2012 at 8:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> In 0.23 and branch-2 there were a lot of changes that went into the
> FileOutputFormat to be able to allow for AppMaster recovery. It is very
> likely that this is a regression from the 1.0 line.  Do you know if this
> works on 1.0?
>
> On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:
>
>>
>>> What version of Hadoop is this on?
>>branch-0.23
>



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hold on. I do not see a _temporary/attemptID path in the path the
error reports? Is MO really doing this or are you getting the filename
manually from something? With MO, MO builds the file paths on its own,
and there's no need to use uniquepath calls or the like.

Sorry I didn't notice this carefully before. If you can share a
reproducible test-case job, that'd be of great help.

On Fri, Sep 14, 2012 at 8:36 PM, Robert Evans <ev...@yahoo-inc.com> wrote:
> In 0.23 and branch-2 there were a lot of changes that went into the
> FileOutputFormat to be able to allow for AppMaster recovery. It is very
> likely that this is a regression from the 1.0 line.  Do you know if this
> works on 1.0?
>
> On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:
>
>>
>>> What version of Hadoop is this on?
>>branch-0.23
>



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
In 0.23 and branch-2 there were a lot of changes that went into the
FileOutputFormat to be able to allow for AppMaster recovery. It is very
likely that this is a regression from the 1.0 line.  Do you know if this
works on 1.0?

On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> What version of Hadoop is this on?
>branch-0.23


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
In 0.23 and branch-2 there were a lot of changes that went into the
FileOutputFormat to be able to allow for AppMaster recovery. It is very
likely that this is a regression from the 1.0 line.  Do you know if this
works on 1.0?

On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> What version of Hadoop is this on?
>branch-0.23


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
In 0.23 and branch-2 there were a lot of changes that went into the
FileOutputFormat to be able to allow for AppMaster recovery. It is very
likely that this is a regression from the 1.0 line.  Do you know if this
works on 1.0?

On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> What version of Hadoop is this on?
>branch-0.23


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
In 0.23 and branch-2 there were a lot of changes that went into the
FileOutputFormat to be able to allow for AppMaster recovery. It is very
likely that this is a regression from the 1.0 line.  Do you know if this
works on 1.0?

On 9/13/12 2:51 PM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> What version of Hadoop is this on?
>branch-0.23


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> What version of Hadoop is this on?
branch-0.23

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> What version of Hadoop is this on?
branch-0.23

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> What version of Hadoop is this on?
branch-0.23

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> What version of Hadoop is this on?
branch-0.23

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
What version of Hadoop is this on?

On 9/13/12 3:09 AM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> Does your job use the FileOutputCommitter?
>
>yes.
>job.setOutputFormatClass(SequenceFileOutputFormat.class);


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
What version of Hadoop is this on?

On 9/13/12 3:09 AM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> Does your job use the FileOutputCommitter?
>
>yes.
>job.setOutputFormatClass(SequenceFileOutputFormat.class);


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
What version of Hadoop is this on?

On 9/13/12 3:09 AM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> Does your job use the FileOutputCommitter?
>
>yes.
>job.setOutputFormatClass(SequenceFileOutputFormat.class);


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Robert Evans <ev...@yahoo-inc.com>.
What version of Hadoop is this on?

On 9/13/12 3:09 AM, "Radim Kolar" <hs...@filez.com> wrote:

>
>> Does your job use the FileOutputCommitter?
>
>yes.
>job.setOutputFormatClass(SequenceFileOutputFormat.class);


Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> Does your job use the FileOutputCommitter?

yes.
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> Does your job use the FileOutputCommitter?

yes.
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> Does your job use the FileOutputCommitter?

yes.
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Radim Kolar <hs...@filez.com>.
> Does your job use the FileOutputCommitter?

yes.
job.setOutputFormatClass(SequenceFileOutputFormat.class);

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hey Radim,

Does your job use the FileOutputCommitter?

On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar <hs...@filez.com> wrote:
> with speculative execution enabled Hadoop can run task attempt on more then
> 1 node. If mapper is using multipleoutputs then second attempt (or sometimes
> even all) fails to create output file because it is being created by another
> attempt:
>
> attempt_1347286420691_0011_m_000000_0
> attempt_1347286420691_0011_m_000000_1
> ..
> fails with
> Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
> to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
>
> in my code i am using mos.write with 4 arguments. this problem is discussed
> in javadoc for FileOutputFormat function getWorkOutputPath, its possible to
> change MultipleOutputs to take advantage of this function?
>
> or its better to change FileOoutputFormat.getUniqueFile() to append last
> digit in attempt id to filename to create unique names such as
> /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hey Radim,

Does your job use the FileOutputCommitter?

On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar <hs...@filez.com> wrote:
> with speculative execution enabled Hadoop can run task attempt on more then
> 1 node. If mapper is using multipleoutputs then second attempt (or sometimes
> even all) fails to create output file because it is being created by another
> attempt:
>
> attempt_1347286420691_0011_m_000000_0
> attempt_1347286420691_0011_m_000000_1
> ..
> fails with
> Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
> to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
>
> in my code i am using mos.write with 4 arguments. this problem is discussed
> in javadoc for FileOutputFormat function getWorkOutputPath, its possible to
> change MultipleOutputs to take advantage of this function?
>
> or its better to change FileOoutputFormat.getUniqueFile() to append last
> digit in attempt id to filename to create unique names such as
> /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hey Radim,

Does your job use the FileOutputCommitter?

On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar <hs...@filez.com> wrote:
> with speculative execution enabled Hadoop can run task attempt on more then
> 1 node. If mapper is using multipleoutputs then second attempt (or sometimes
> even all) fails to create output file because it is being created by another
> attempt:
>
> attempt_1347286420691_0011_m_000000_0
> attempt_1347286420691_0011_m_000000_1
> ..
> fails with
> Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
> to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
>
> in my code i am using mos.write with 4 arguments. this problem is discussed
> in javadoc for FileOutputFormat function getWorkOutputPath, its possible to
> change MultipleOutputs to take advantage of this function?
>
> or its better to change FileOoutputFormat.getUniqueFile() to append last
> digit in attempt id to filename to create unique names such as
> /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?



-- 
Harsh J

Re: multipleoutputs does not like speculative execution in map-only job

Posted by Harsh J <ha...@cloudera.com>.
Hey Radim,

Does your job use the FileOutputCommitter?

On Thu, Sep 13, 2012 at 4:21 AM, Radim Kolar <hs...@filez.com> wrote:
> with speculative execution enabled Hadoop can run task attempt on more then
> 1 node. If mapper is using multipleoutputs then second attempt (or sometimes
> even all) fails to create output file because it is being created by another
> attempt:
>
> attempt_1347286420691_0011_m_000000_0
> attempt_1347286420691_0011_m_000000_1
> ..
> fails with
> Error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed
> to create file /cznewgen/segments/20120907190053/parse_db/-m-00000
>
> in my code i am using mos.write with 4 arguments. this problem is discussed
> in javadoc for FileOutputFormat function getWorkOutputPath, its possible to
> change MultipleOutputs to take advantage of this function?
>
> or its better to change FileOoutputFormat.getUniqueFile() to append last
> digit in attempt id to filename to create unique names such as
> /cznewgen/segments/20120907190053/parse_db/-m-00000_0 ?



-- 
Harsh J