You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Marc Sturlese <ma...@gmail.com> on 2011/03/02 11:15:32 UTC

Tasks seem to fail randomly with nonzero status of 1

Hey there,
My cluster was working fine but suddenly lots and lots of tasks start
failing like:

java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:472)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:459)

I restarted the whole cluster but since it happened once its getting broken
every time I run a job.
Any clue or advice?
Thanks in advance.

--
View this message in context: http://lucene.472066.n3.nabble.com/Tasks-seem-to-fail-randomly-with-nonzero-status-of-1-tp2612433p2612433.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Tasks seem to fail randomly with nonzero status of 1

Posted by Marc Sturlese <ma...@gmail.com>.
Well I'ven been running these jobs for days. It's just happening since last
night and now even if I restart the error keeps happening. I'am the only one
using the cluster

--
View this message in context: http://lucene.472066.n3.nabble.com/Tasks-seem-to-fail-randomly-with-nonzero-status-of-1-tp2612433p2612509.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Tasks seem to fail randomly with nonzero status of 1

Posted by Hari Sreekumar <hs...@clickable.com>.
Did this happen just once or it happens every time? This usually happens
when the Child processes are forcibly killed. If it was a one-off thing, it
is possible that someone else working on your machine at the same time
killed the processes. If it happens every time, then it could be due to lack
of system resources. Maybe unix is killing these processes because they are
eating too much RAM?

On Wed, Mar 2, 2011 at 3:45 PM, Marc Sturlese <ma...@gmail.com>wrote:

> Hey there,
> My cluster was working fine but suddenly lots and lots of tasks start
> failing like:
>
> java.lang.Throwable: Child Error
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:472)
> Caused by: java.io.IOException: Task process exit with nonzero status of 1.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:459)
>
> I restarted the whole cluster but since it happened once its getting broken
> every time I run a job.
> Any clue or advice?
> Thanks in advance.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tasks-seem-to-fail-randomly-with-nonzero-status-of-1-tp2612433p2612433.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>