You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Stevens, Keith D." <st...@llnl.gov> on 2011/08/01 21:21:23 UTC

Mappers fail to initialize and are killed after 600 seconds

Hi all,

I'm running a simple mapreduce job that connects to an hbase table, reads each row, counts some co-occurrence frequencies, and writes everything out to hdfs at the end.  Everything seems to be going smoothly until the last 5, out of 108, tasks run.  The last 5 tasks seem to be stuck initializing.  As far as I can tell, setup is never called, and eventually, after 600 seconds, the task is killed.  The task jumps around different nodes to try and run but regardless of the node, it fails to initialize and is killed.

My first guess is that it's trying to connect to an hbase region server and failing, but I don't see anything like this in the task tracker nodes.  Here are the log lines related to one of the failed tasks from the task trackers logs:

2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:UNASSIGNED
2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1189914759 given task: attempt_201107281508_0028_m_000027_0
2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0: Task attempt_201107281508_0028_m_000027_0 failed to report status for 600 seconds. Killing!
2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201107281508_0028_m_000027_0
2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201107281508_0028_m_000027_0 not found in cache
2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:FAILED_UNCLEAN
2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1869983962 given task: attempt_201107281508_0028_m_000027_0
2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0% 
2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0% cleanup
2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201107281508_0028_m_000027_0 is done.
2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201107281508_0028_m_000027_0  was -1
2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.

And here are the syslog lines:
In my job, I set the stats when i enter and exit setup, and I set counters in map.  None of these are triggered for this task.  Nothing is written to stderr or stdout, and the syslogs for the task have nothing beyond the zookeeper client connection lines.

Any thoughts as to what might be causing this issue?  Is there another log that indicates which region server this task is trying to connect to?

Thanks!
--Keith Stevens

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by Mauricio Morales <ma...@careerbuilder.com>.

Stevens, Keith D. <st...@...> writes:

> 
> In short, there are no userlogs.  stderr and stdout are both empty.  I copied 
the output from syslog to the
> following pastebin: http://pastebin.com/0XXE9Jze.  The first 22 lines look to 
be exactly the same as the
> syslogs for other, non-dying, tasks.   The main departure is on line 23 where 
the loader can't seem to load
> native-hadoop libraries, and this happens about 10 minutes after starting up.
> 
> --Keith
> 


Hi Keith, Did you ever find the solution for this? I'm having this very same 
issue and can't figure out what's the deal!

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by Mauricio Morales <ma...@careerbuilder.com>.

Stevens, Keith D. <st...@...> writes:

> 
> In short, there are no userlogs.  stderr and stdout are both empty.  I copied 
the output from syslog to the
> following pastebin: http://pastebin.com/0XXE9Jze.  The first 22 lines look to 
be exactly the same as the
> syslogs for other, non-dying, tasks.   The main departure is on line 23 where 
the loader can't seem to load
> native-hadoop libraries, and this happens about 10 minutes after starting up.
> 
> --Keith
> 


Hi Keith, Did you ever find the solution for this? I'm having this very same 
issue and can't figure out what's the deal!

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by Mauricio Morales <ma...@careerbuilder.com>.

Stevens, Keith D. <st...@...> writes:

> 
> In short, there are no userlogs.  stderr and stdout are both empty.  I copied 
the output from syslog to the
> following pastebin: http://pastebin.com/0XXE9Jze.  The first 22 lines look to 
be exactly the same as the
> syslogs for other, non-dying, tasks.   The main departure is on line 23 where 
the loader can't seem to load
> native-hadoop libraries, and this happens about 10 minutes after starting up.
> 
> --Keith
> 


Hi Keith, Did you ever find the solution for this? I'm having this very same 
issue and can't figure out what's the deal!

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by Mauricio Morales <ma...@careerbuilder.com>.

Stevens, Keith D. <st...@...> writes:

> 
> In short, there are no userlogs.  stderr and stdout are both empty.  I copied 
the output from syslog to the
> following pastebin: http://pastebin.com/0XXE9Jze.  The first 22 lines look to 
be exactly the same as the
> syslogs for other, non-dying, tasks.   The main departure is on line 23 where 
the loader can't seem to load
> native-hadoop libraries, and this happens about 10 minutes after starting up.
> 
> --Keith
> 


Hi Keith, Did you ever find the solution for this? I'm having this very same 
issue and can't figure out what's the deal!

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by "Stevens, Keith D." <st...@llnl.gov>.

In short, there are no userlogs.  stderr and stdout are both empty.  I copied the output from syslog to the following pastebin: http://pastebin.com/0XXE9Jze.  The first 22 lines look to be exactly the same as the syslogs for other, non-dying, tasks.   The main departure is on line 23 where the loader can't seem to load native-hadoop libraries, and this happens about 10 minutes after starting up.

--Keith

On Aug 1, 2011, at 1:00 PM, Harsh J wrote:

> Are there no userlogs from the failed tasks? TaskTracker logs won't
> carry user-code (task) logs. Could you paste those syslog lines (from
> the task) to pastebin/etc. since the lists may not be accepting
> attachments?
> 
> On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. <st...@llnl.gov> wrote:
>> Hi all,
>> 
>> I'm running a simple mapreduce job that connects to an hbase table, reads each row, counts some co-occurrence frequencies, and writes everything out to hdfs at the end.  Everything seems to be going smoothly until the last 5, out of 108, tasks run.  The last 5 tasks seem to be stuck initializing.  As far as I can tell, setup is never called, and eventually, after 600 seconds, the task is killed.  The task jumps around different nodes to try and run but regardless of the node, it fails to initialize and is killed.
>> 
>> My first guess is that it's trying to connect to an hbase region server and failing, but I don't see anything like this in the task tracker nodes.  Here are the log lines related to one of the failed tasks from the task trackers logs:
>> 
>> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:UNASSIGNED
>> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
>> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
>> 2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1189914759 given task: attempt_201107281508_0028_m_000027_0
>> 2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0: Task attempt_201107281508_0028_m_000027_0 failed to report status for 600 seconds. Killing!
>> 2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201107281508_0028_m_000027_0
>> 2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
>> 2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201107281508_0028_m_000027_0 not found in cache
>> 2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:FAILED_UNCLEAN
>> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
>> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
>> 2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1869983962 given task: attempt_201107281508_0028_m_000027_0
>> 2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0%
>> 2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0% cleanup
>> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201107281508_0028_m_000027_0 is done.
>> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201107281508_0028_m_000027_0  was -1
>> 2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
>> 2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
>> 
>> And here are the syslog lines:
>> In my job, I set the stats when i enter and exit setup, and I set counters in map.  None of these are triggered for this task.  Nothing is written to stderr or stdout, and the syslogs for the task have nothing beyond the zookeeper client connection lines.
>> 
>> Any thoughts as to what might be causing this issue?  Is there another log that indicates which region server this task is trying to connect to?
>> 
>> Thanks!
>> --Keith Stevens
> 
> 
> 
> -- 
> Harsh J

Re: Mappers fail to initialize and are killed after 600 seconds

Posted by Harsh J <ha...@cloudera.com>.

Are there no userlogs from the failed tasks? TaskTracker logs won't
carry user-code (task) logs. Could you paste those syslog lines (from
the task) to pastebin/etc. since the lists may not be accepting
attachments?

On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. <st...@llnl.gov> wrote:
> Hi all,
>
> I'm running a simple mapreduce job that connects to an hbase table, reads each row, counts some co-occurrence frequencies, and writes everything out to hdfs at the end.  Everything seems to be going smoothly until the last 5, out of 108, tasks run.  The last 5 tasks seem to be stuck initializing.  As far as I can tell, setup is never called, and eventually, after 600 seconds, the task is killed.  The task jumps around different nodes to try and run but regardless of the node, it fails to initialize and is killed.
>
> My first guess is that it's trying to connect to an hbase region server and failing, but I don't see anything like this in the task tracker nodes.  Here are the log lines related to one of the failed tasks from the task trackers logs:
>
> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:UNASSIGNED
> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
> 2011-08-01 12:01:08,889 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
> 2011-08-01 12:01:12,243 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1189914759 given task: attempt_201107281508_0028_m_000027_0
> 2011-08-01 12:11:09,462 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0: Task attempt_201107281508_0028_m_000027_0 failed to report status for 600 seconds. Killing!
> 2011-08-01 12:11:09,467 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201107281508_0028_m_000027_0
> 2011-08-01 12:11:14,488 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
> 2011-08-01 12:11:14,489 INFO org.apache.hadoop.mapred.IndexCache: Map ID attempt_201107281508_0028_m_000027_0 not found in cache
> 2011-08-01 12:11:14,495 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201107281508_0028_m_000027_0 task's state:FAILED_UNCLEAN
> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201107281508_0028_m_000027_0 which needs 1 slots
> 2011-08-01 12:11:14,496 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201107281508_0028_m_000027_0 which needs 1 slots
> 2011-08-01 12:11:15,045 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201107281508_0028_m_-1869983962 given task: attempt_201107281508_0028_m_000027_0
> 2011-08-01 12:11:15,346 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0%
> 2011-08-01 12:11:15,348 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201107281508_0028_m_000027_0 0.0% cleanup
> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201107281508_0028_m_000027_0 is done.
> 2011-08-01 12:11:15,349 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201107281508_0028_m_000027_0  was -1
> 2011-08-01 12:11:15,354 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
> 2011-08-01 12:11:17,495 INFO org.apache.hadoop.mapred.TaskRunner: attempt_201107281508_0028_m_000027_0 done; removing files.
>
> And here are the syslog lines:
> In my job, I set the stats when i enter and exit setup, and I set counters in map.  None of these are triggered for this task.  Nothing is written to stderr or stdout, and the syslogs for the task have nothing beyond the zookeeper client connection lines.
>
> Any thoughts as to what might be causing this issue?  Is there another log that indicates which region server this task is trying to connect to?
>
> Thanks!
> --Keith Stevens



-- 
Harsh J