You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/12/22 19:24:17 UTC

Unusual Failure of jobs

I have been experiencing some unusual behavior from Hadoop recently.  
When trying to run a job, some of the tasks fail with:

java.io.IOException: Task process exit with nonzero status of 1.
	at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462)
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403


Not all the tasks fail, but enough tasks fail such that the job fails.  
Unfortunately, there are no further logs for these tasks. Trying to  
retrieve the logs produces:

HTTP ERROR: 410

Failed to retrieve stdout log for task:  
attempt_200811101232_0218_m_000001_0

RequestURI=/tasklog


It seems like the tasktracker isn't able to even start the tasks on  
those machines. Has anyone seen anything like this before?


--------------------------------------------------------
We're looking for an Amazing Software Engineers (+ interns):
http://business.rapleaf.com/careers.html

The Rapleaf Bailout Plan - Send a qualified referral (resume) and we
will award you with $10,007 bailout package if we hire that person.


Re: Unusual Failure of jobs

Posted by Karl Anderson <kr...@monkey.org>.
On 22-Dec-08, at 10:24 AM, Nathan Marz wrote:

> I have been experiencing some unusual behavior from Hadoop recently.  
> When trying to run a job, some of the tasks fail with:
>
> java.io.IOException: Task process exit with nonzero status of 1.
> 	at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462)
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403
>
>
> Not all the tasks fail, but enough tasks fail such that the job  
> fails. Unfortunately, there are no further logs for these tasks.  
> Trying to retrieve the logs produces:
>
> HTTP ERROR: 410
>
> Failed to retrieve stdout log for task:  
> attempt_200811101232_0218_m_000001_0
>
> RequestURI=/tasklog
>
>
> It seems like the tasktracker isn't able to even start the tasks on  
> those machines. Has anyone seen anything like this before?

I see this on jobs that also get the "too many open files" task  
errors, or on subsequent jobs.  I've always assumed that it's another  
manifestation of the same problem.  Once I start getting these errors,  
I keep getting them until I shut down the cluster, although I don't  
always get enough to cause a job to fail.  I haven't bothered  
restarting individual boxes or services.

I haven't been able to reproduce it consistently, but it seems to  
happen when I have many small input files; a job with one large input  
file broke after I split the input up.  I'm using Streaming.

Karl Anderson
kra@monkey.org
http://monkey.org/~kra




Re: Unusual Failure of jobs

Posted by Sagar Naik <sn...@attributor.com>.
Check the logs on disk
On TaskTracker node : check for {HADOOP_HOME}/logs/*tasktracker.log and out
check for logs under 
{HADOOP_HOME}/logs/userlog/attempt_200811101232_0218_m_000001_0/[stdout, 
stderr,    syslog]

Nathan Marz wrote:
> I have been experiencing some unusual behavior from Hadoop recently. 
> When trying to run a job, some of the tasks fail with:
>
> java.io.IOException: Task process exit with nonzero status of 1.
>     at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462)
>     at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403
>
>
> Not all the tasks fail, but enough tasks fail such that the job fails. 
> Unfortunately, there are no further logs for these tasks. Trying to 
> retrieve the logs produces:
>
> HTTP ERROR: 410
>
> Failed to retrieve stdout log for task: 
> attempt_200811101232_0218_m_000001_0
>
> RequestURI=/tasklog
>
>
> It seems like the tasktracker isn't able to even start the tasks on 
> those machines. Has anyone seen anything like this before?
>
>
> --------------------------------------------------------
> We're looking for an Amazing Software Engineers (+ interns):
> http://business.rapleaf.com/careers.html
>
> The Rapleaf Bailout Plan - Send a qualified referral (resume) and we
> will award you with $10,007 bailout package if we hire that person.
>
>