You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Matt Kent <ma...@persai.com> on 2008/03/17 23:14:47 UTC
runtime exceptions not killing job
I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1,
if a map or reduce task threw a runtime exception such as an NPE, the
task, and ultimately the job, would fail in short order. I was running
on job on my local 0.16.1 cluster today, and when the reduce tasks
started throwing NPEs, the tasks just hung. Eventually they timed out
and were killed, but is this expected behavior in 0.16.1? I'd prefer the
job to fail quickly if NPEs are being thrown.
Matt
--
Matt Kent
Co-Founder
Persai
1221 40th St #113
Emeryville, CA 94608
matt@persai.com
Re: runtime exceptions not killing job
Posted by Matt Kent <ma...@persai.com>.
It seems to happen only with reduce tasks, not map tasks. I reproduced
it by having a dummy reduce task throw an NPE immediately. The error is
shown on the reduce details page but the job does not register the task
as failed. I've attached the task tracker stack trace, the child stack
trace and a screenshot of the task list page.
Matt
Owen O'Malley wrote:
>
> On Mar 17, 2008, at 3:14 PM, Matt Kent wrote:
>
>> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in
>> 0.14.1, if a map or reduce task threw a runtime exception such as an
>> NPE, the task, and ultimately the job, would fail in short order. I
>> was running on job on my local 0.16.1 cluster today, and when the
>> reduce tasks started throwing NPEs, the tasks just hung. Eventually
>> they timed out and were killed, but is this expected behavior in
>> 0.16.1? I'd prefer the job to fail quickly if NPEs are being thrown.
>
> This sounds like a bug. Tasks should certainly fail immediately if an
> exception is thrown. Do you know where the exception is being thrown?
> Can you get a stack trace of the task from jstack after the exception
> and before the task times out?
>
> Thanks,
> Owen
>
--
Matt Kent
Co-Founder
Persai
1221 40th St #113
Emeryville, CA 94608
matt@persai.com
Re: runtime exceptions not killing job
Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Mar 17, 2008, at 3:14 PM, Matt Kent wrote:
> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in
> 0.14.1, if a map or reduce task threw a runtime exception such as
> an NPE, the task, and ultimately the job, would fail in short
> order. I was running on job on my local 0.16.1 cluster today, and
> when the reduce tasks started throwing NPEs, the tasks just hung.
> Eventually they timed out and were killed, but is this expected
> behavior in 0.16.1? I'd prefer the job to fail quickly if NPEs are
> being thrown.
This sounds like a bug. Tasks should certainly fail immediately if an
exception is thrown. Do you know where the exception is being thrown?
Can you get a stack trace of the task from jstack after the exception
and before the task times out?
Thanks,
Owen
Re: Hadoop-Patch buil is not progressing for 6 hours
Posted by Nigel Daley <nd...@yahoo-inc.com>.
org.apache.hadoop.streaming.TestGzipInput was stuck. I killed it.
Nige
On Mar 17, 2008, at 8:17 PM, Konstantin Shvachko wrote:
> Usually a build takes 2 hours or less.
> This one is stuck and I don't see changes in the QUEUE OF PENDING
> PATCHES when I submit a patch.
> I guess something is wrong with Hadson.
> Could anybody please check.
> --Konstantin
Hadoop-Patch buil is not progressing for 6 hours
Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Usually a build takes 2 hours or less.
This one is stuck and I don't see changes in the QUEUE OF PENDING PATCHES when I submit a patch.
I guess something is wrong with Hadson.
Could anybody please check.
--Konstantin
Re: runtime exceptions not killing job
Posted by Chris Dyer <re...@umd.edu>.
I've noticed this behavior as well in 16.0 with RuntimeExceptions in general.
Chris
On Mon, Mar 17, 2008 at 6:14 PM, Matt Kent <ma...@persai.com> wrote:
> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1,
> if a map or reduce task threw a runtime exception such as an NPE, the
> task, and ultimately the job, would fail in short order. I was running
> on job on my local 0.16.1 cluster today, and when the reduce tasks
> started throwing NPEs, the tasks just hung. Eventually they timed out
> and were killed, but is this expected behavior in 0.16.1? I'd prefer the
> job to fail quickly if NPEs are being thrown.
>
> Matt
>
> --
> Matt Kent
> Co-Founder
> Persai
> 1221 40th St #113
> Emeryville, CA 94608
> matt@persai.com
>
>