You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Matt Kent <ma...@persai.com> on 2008/03/17 23:14:47 UTC

runtime exceptions not killing job

I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1, 
if a map or reduce task threw a runtime exception such as an NPE, the 
task, and ultimately the job, would fail in short order. I was running 
on job on my local 0.16.1 cluster today, and when the reduce tasks 
started throwing NPEs, the tasks just hung. Eventually they timed out 
and were killed, but is this expected behavior in 0.16.1? I'd prefer the 
job to fail quickly if NPEs are being thrown.

Matt

-- 
Matt Kent
Co-Founder
Persai
1221 40th St #113
Emeryville, CA 94608
matt@persai.com

Re: runtime exceptions not killing job

Posted by Matt Kent <ma...@persai.com>.

It seems to happen only with reduce tasks, not map tasks. I reproduced 
it by having a dummy reduce task throw an NPE immediately. The error is 
shown on the reduce details page but the job does not register the task 
as failed. I've attached the task tracker stack trace, the child stack 
trace and a screenshot of the task list page.

Matt

Owen O'Malley wrote:
>
> On Mar 17, 2008, at 3:14 PM, Matt Kent wrote:
>
>> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 
>> 0.14.1, if a map or reduce task threw a runtime exception such as an 
>> NPE, the task, and ultimately the job, would fail in short order. I 
>> was running on job on my local 0.16.1 cluster today, and when the 
>> reduce tasks started throwing NPEs, the tasks just hung. Eventually 
>> they timed out and were killed, but is this expected behavior in 
>> 0.16.1? I'd prefer the job to fail quickly if NPEs are being thrown.
>
> This sounds like a bug. Tasks should certainly fail immediately if an 
> exception is thrown. Do you know where the exception is being thrown? 
> Can you get a stack trace of the task from jstack after the exception 
> and before the task times out?
>
> Thanks,
>    Owen
>


-- 
Matt Kent
Co-Founder
Persai
1221 40th St #113
Emeryville, CA 94608
matt@persai.com

Re: runtime exceptions not killing job

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Mar 17, 2008, at 3:14 PM, Matt Kent wrote:

> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in  
> 0.14.1, if a map or reduce task threw a runtime exception such as  
> an NPE, the task, and ultimately the job, would fail in short  
> order. I was running on job on my local 0.16.1 cluster today, and  
> when the reduce tasks started throwing NPEs, the tasks just hung.  
> Eventually they timed out and were killed, but is this expected  
> behavior in 0.16.1? I'd prefer the job to fail quickly if NPEs are  
> being thrown.

This sounds like a bug. Tasks should certainly fail immediately if an  
exception is thrown. Do you know where the exception is being thrown?  
Can you get a stack trace of the task from jstack after the exception  
and before the task times out?

Thanks,
    Owen

Re: Hadoop-Patch buil is not progressing for 6 hours

Posted by Nigel Daley <nd...@yahoo-inc.com>.

org.apache.hadoop.streaming.TestGzipInput was stuck.  I killed it.

Nige

On Mar 17, 2008, at 8:17 PM, Konstantin Shvachko wrote:

> Usually a build takes 2 hours or less.
> This one is stuck and I don't see changes in the QUEUE OF PENDING  
> PATCHES when I submit a patch.
> I guess something is wrong with Hadson.
> Could anybody please check.
> --Konstantin

Hadoop-Patch buil is not progressing for 6 hours

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

Usually a build takes 2 hours or less.
This one is stuck and I don't see changes in the QUEUE OF PENDING PATCHES when I submit a patch.
I guess something is wrong with Hadson.
Could anybody please check.
--Konstantin

Re: runtime exceptions not killing job

Posted by Chris Dyer <re...@umd.edu>.

I've noticed this behavior as well in 16.0 with RuntimeExceptions in general.

Chris

On Mon, Mar 17, 2008 at 6:14 PM, Matt Kent <ma...@persai.com> wrote:
> I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1,
>  if a map or reduce task threw a runtime exception such as an NPE, the
>  task, and ultimately the job, would fail in short order. I was running
>  on job on my local 0.16.1 cluster today, and when the reduce tasks
>  started throwing NPEs, the tasks just hung. Eventually they timed out
>  and were killed, but is this expected behavior in 0.16.1? I'd prefer the
>  job to fail quickly if NPEs are being thrown.
>
>  Matt
>
>  --
>  Matt Kent
>  Co-Founder
>  Persai
>  1221 40th St #113
>  Emeryville, CA 94608
>  matt@persai.com
>
>