You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Manhee Jo <jo...@nttdocomo.com> on 2010/06/16 09:30:46 UTC

Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Hi,

I've also encountered the same nonzero status of 1 error before.
What did you set to mapred.child.ulimit and mapred.child.java.opts?
mapred.child.ulimit must be greater than the -Xmx passed to JavaVM,
else the VM might not start. That's wat MR tutorial says.
Setting bigger ulimit, I could solve the problem.
Hope this help.


Regards,
Manhee

----- Original Message ----- 
From: "Edward Capriolo" <ed...@gmail.com>
To: <co...@hadoop.apache.org>
Sent: Tuesday, June 15, 2010 2:47 AM
Subject: Re: Task process exit with nonzero status of 1 - deleting 
userlogshelps


> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann 
> <jzillmann@googlemail.com
>> wrote:
>
>> Hi,
>>
>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run 
>> into
>> a situation where every task scheduled on 2 of the 4 nodes failed.
>> Seems like the child jvm crashes. There are no child logs under
>> logs/userlogs. Tasktracker gives this:
>>
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201006091425_0049_m_-946174604 spawned.
>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
>> attempt_201006091425_0049_m_003179_0 Child Error
>> java.io.IOException: Task process exit with nonzero status of 1.
>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>
>>
>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new 
>> job
>> created the logs/userlogs again and no error ocuured anymore on this 
>> host.
>> The permissions of userlogs and userlogsOLD are exactly the same.
>> userlogsOLD contains about 378M in 132747 files. When copying the content 
>> of
>> userlogsOLD into userlogs, the tasks of the belonging node starts failing
>> again.
>>
>> Some questions:
>> - this seems to me like a problem with too many files in one folder - any
>> thoughts on this ?
>> - is the content of logs/userlogs cleaned up by hadoop regularly ?
>> - the logs/stdout file of the tasks are not existent, the logs/out fiels 
>> of
>> the tasktracker hasn't any specific message (other then message posted
>> above) - is there any log file left where an error message could be found 
>> ?
>>
>>
>> best regards
>> Johannes
>
>
> Most file systems have an upper limit on number of subfiles/folders in a
> folder. You have probably hit the EXT3 limit. If you launch lots and lots 
> of
> jobs you can hit the limit before any cleanup happens.
>
> You can experiment with cleanup and other filesystems. The following log
> related issue might be relevant.
>
> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
>
> Regards,
> Edward
>

Re: Task process exit with nonzero status of 1 - deleting userlogshelps

Posted by Johannes Zillmann <jz...@googlemail.com>.

Seems like this is something with folder restrictions.
Tried:
  cp -r logs/userlogsOLD/* logs/userlogs/
and got
  cp: cannot create directory `logs/userlogs/attempt_201006091425_0049_m_003169_0': Too many links  

Johannes

On Jun 16, 2010, at 9:30 AM, Manhee Jo wrote:

> Hi,
> 
> I've also encountered the same nonzero status of 1 error before.
> What did you set to mapred.child.ulimit and mapred.child.java.opts?
> mapred.child.ulimit must be greater than the -Xmx passed to JavaVM,
> else the VM might not start. That's wat MR tutorial says.
> Setting bigger ulimit, I could solve the problem.
> Hope this help.
> 
> 
> Regards,
> Manhee
> 
> ----- Original Message ----- From: "Edward Capriolo" <ed...@gmail.com>
> To: <co...@hadoop.apache.org>
> Sent: Tuesday, June 15, 2010 2:47 AM
> Subject: Re: Task process exit with nonzero status of 1 - deleting userlogshelps
> 
> 
>> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <jzillmann@googlemail.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
>>> a situation where every task scheduled on 2 of the 4 nodes failed.
>>> Seems like the child jvm crashes. There are no child logs under
>>> logs/userlogs. Tasktracker gives this:
>>> 
>>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
>>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
>>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
>>> Runner jvm_201006091425_0049_m_-946174604 spawned.
>>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
>>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201006091425_0049_m_003179_0 Child Error
>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>>> 
>>> 
>>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
>>> created the logs/userlogs again and no error ocuured anymore on this host.
>>> The permissions of userlogs and userlogsOLD are exactly the same.
>>> userlogsOLD contains about 378M in 132747 files. When copying the content of
>>> userlogsOLD into userlogs, the tasks of the belonging node starts failing
>>> again.
>>> 
>>> Some questions:
>>> - this seems to me like a problem with too many files in one folder - any
>>> thoughts on this ?
>>> - is the content of logs/userlogs cleaned up by hadoop regularly ?
>>> - the logs/stdout file of the tasks are not existent, the logs/out fiels of
>>> the tasktracker hasn't any specific message (other then message posted
>>> above) - is there any log file left where an error message could be found ?
>>> 
>>> 
>>> best regards
>>> Johannes
>> 
>> 
>> Most file systems have an upper limit on number of subfiles/folders in a
>> folder. You have probably hit the EXT3 limit. If you launch lots and lots of
>> jobs you can hit the limit before any cleanup happens.
>> 
>> You can experiment with cleanup and other filesystems. The following log
>> related issue might be relevant.
>> 
>> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
>> 
>> Regards,
>> Edward
> 
>