You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mike Kendall <mk...@justin.tv> on 2009/11/13 23:02:57 UTC

what does it mean when a job fails at 100%?

title says it all..  this isn't the first job i've written either.  very
confused.

Re: what does it mean when a job fails at 100%?

Posted by Mike Kendall <mk...@justin.tv>.

oh and just fyi this is the only failed task.  everything else works just
fine.  maybe the data copied over incorrectly or was malformed...  /me
checks

On Fri, Nov 13, 2009 at 3:03 PM, Mike Kendall <mk...@justin.tv> wrote:

> Hmm..  let's collect some error messages.  looks like the same task failed
> 4 times...  is there a way that i can get better logs about this task?
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
> TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_0" TASK_STATUS="FAILED"
> FINISH_TIME="1258123661159" HOSTNAME="hadoop3\.justin\.tv"
> ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
> subprocess failed with code 1
>         at
> org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
> TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_1" TASK_STATUS="FAILED"
> FINISH_TIME="1258123852424" HOSTNAME="hadoop1\.justin\.tv"
> ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
> subprocess failed with code 1
>         at
> org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
> TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_2" TASK_STATUS="FAILED"
> FINISH_TIME="1258123725938" HOSTNAME="hadoop4\.justin\.tv"
> ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
> subprocess failed with code 1        at
> org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
>
>
> MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
> TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_3" TASK_STATUS="FAILED"
> FINISH_T
> IME="1258123756980" HOSTNAME="hadoop2\.justin\.tv"
> ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
> subprocess failed with code 1
>         at
> org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
>
> Task TASKID="task_200911131440_0001_m_000307" TASK_TYPE="MAP"
> TASK_STATUS="FAILED" FINISH_TIME="1258123756980"
> ERROR="java\.lang\.RuntimeException: Pipe
> MapRed\.waitOutputThreads(): subprocess failed with code 1        at
> org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)
>
>
> On Fri, Nov 13, 2009 at 2:16 PM, Ashutosh Chauhan <
> ashutosh.chauhan@gmail.com> wrote:
>
>> Hi Mike,
>>
>> This % reported represents % of records read by framework not % of records
>> processed. So, for sake of example lets say you only have one record in
>> the
>> data, framework will report 100% as soon as it is read even though you
>> might
>> be doing lot of processing on that record and that processing is still
>> going
>> on. Second, there can be floating point errors here so e.g., after reading
>> 9991 records out of total 10000 for the split, counter will say 100% while
>> some records are still untouched. Lastly, if you are using close() method,
>> your task might be failing there and framework will report 100% before
>> that.
>> I am not expert on counters, so you may to hear from others before
>> believing
>> what I am saying :)
>>
>> Thanks,
>> Ashutosh
>>
>> On Fri, Nov 13, 2009 at 17:15, brien colwell <xc...@gmail.com> wrote:
>>
>> > It could be that the result can't be written to HDFS. Is there any hint
>> in
>> > the log? I recently encountered this behavior when writing many files
>> back.
>> >
>> >
>> >
>> > Mike Kendall wrote:
>> >
>> >> title says it all..  this isn't the first job i've written either.
>>  very
>> >> confused.
>> >>
>> >>
>> >>
>> >
>> >
>>
>
>

Re: what does it mean when a job fails at 100%?

Posted by Mike Kendall <mk...@justin.tv>.

Hmm..  let's collect some error messages.  looks like the same task failed 4
times...  is there a way that i can get better logs about this task?

MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_0" TASK_STATUS="FAILED"
FINISH_TIME="1258123661159" HOSTNAME="hadoop3\.justin\.tv"
ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
subprocess failed with code 1
        at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)

MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_1" TASK_STATUS="FAILED"
FINISH_TIME="1258123852424" HOSTNAME="hadoop1\.justin\.tv"
ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
subprocess failed with code 1
        at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)

MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_2" TASK_STATUS="FAILED"
FINISH_TIME="1258123725938" HOSTNAME="hadoop4\.justin\.tv"
ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
subprocess failed with code 1        at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)


MapAttempt TASK_TYPE="MAP" TASKID="task_200911131440_0001_m_000307"
TASK_ATTEMPT_ID="attempt_200911131440_0001_m_000307_3" TASK_STATUS="FAILED"
FINISH_T
IME="1258123756980" HOSTNAME="hadoop2\.justin\.tv"
ERROR="java\.lang\.RuntimeException: PipeMapRed\.waitOutputThreads():
subprocess failed with code 1
        at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)

Task TASKID="task_200911131440_0001_m_000307" TASK_TYPE="MAP"
TASK_STATUS="FAILED" FINISH_TIME="1258123756980"
ERROR="java\.lang\.RuntimeException: Pipe
MapRed\.waitOutputThreads(): subprocess failed with code 1        at
org\.apache\.hadoop\.streaming\.PipeMapRed\.waitOutputThreads(PipeMapRed\.java:311)

On Fri, Nov 13, 2009 at 2:16 PM, Ashutosh Chauhan <
ashutosh.chauhan@gmail.com> wrote:

> Hi Mike,
>
> This % reported represents % of records read by framework not % of records
> processed. So, for sake of example lets say you only have one record in the
> data, framework will report 100% as soon as it is read even though you
> might
> be doing lot of processing on that record and that processing is still
> going
> on. Second, there can be floating point errors here so e.g., after reading
> 9991 records out of total 10000 for the split, counter will say 100% while
> some records are still untouched. Lastly, if you are using close() method,
> your task might be failing there and framework will report 100% before
> that.
> I am not expert on counters, so you may to hear from others before
> believing
> what I am saying :)
>
> Thanks,
> Ashutosh
>
> On Fri, Nov 13, 2009 at 17:15, brien colwell <xc...@gmail.com> wrote:
>
> > It could be that the result can't be written to HDFS. Is there any hint
> in
> > the log? I recently encountered this behavior when writing many files
> back.
> >
> >
> >
> > Mike Kendall wrote:
> >
> >> title says it all..  this isn't the first job i've written either.  very
> >> confused.
> >>
> >>
> >>
> >
> >
>

Re: what does it mean when a job fails at 100%?

Posted by Ashutosh Chauhan <as...@gmail.com>.

Hi Mike,

This % reported represents % of records read by framework not % of records
processed. So, for sake of example lets say you only have one record in the
data, framework will report 100% as soon as it is read even though you might
be doing lot of processing on that record and that processing is still going
on. Second, there can be floating point errors here so e.g., after reading
9991 records out of total 10000 for the split, counter will say 100% while
some records are still untouched. Lastly, if you are using close() method,
your task might be failing there and framework will report 100% before that.
I am not expert on counters, so you may to hear from others before believing
what I am saying :)

Thanks,
Ashutosh

On Fri, Nov 13, 2009 at 17:15, brien colwell <xc...@gmail.com> wrote:

> It could be that the result can't be written to HDFS. Is there any hint in
> the log? I recently encountered this behavior when writing many files back.
>
>
>
> Mike Kendall wrote:
>
>> title says it all..  this isn't the first job i've written either.  very
>> confused.
>>
>>
>>
>
>

Re: what does it mean when a job fails at 100%?

Posted by brien colwell <xc...@gmail.com>.

It could be that the result can't be written to HDFS. Is there any hint 
in the log? I recently encountered this behavior when writing many files 
back.


Mike Kendall wrote:
> title says it all..  this isn't the first job i've written either.  very
> confused.
>
>

Re: what does it mean when a job fails at 100%?

Posted by Edmund Kohlwey <ek...@gmail.com>.

Lots of things can happen. If you have a cleanup method, that can fail 
after map and reduce complete. Also, hadoop writes the output of a task 
to local disk, and only commits the results of the individual tasks to 
HDFS after they complete, so you might be failing on the copy to HDFS.

On 11/13/09 5:02 PM, Mike Kendall wrote:
> title says it all..  this isn't the first job i've written either.  very
> confused.
>
>