You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Sudharsan Sampath <su...@gmail.com> on 2011/06/22 08:21:16 UTC

Map job hangs indefinitely

Hi,

I am starting a job from the map of another job. Following are quick mock of
the code snippets that I use. But the 2nd job hangs indefinitely after the
1st task attempt fails. There is not even a 2nd attempt. This runs fine on a
cluster with one node but fails on a two node cluster.

Can someone help me in understanding why the failed attempt was unable to be
rescheduled and thereby hangs the job.

Thanks
Sudhan S

Re: Map job hangs indefinitely

Posted by Sudharsan Sampath <su...@gmail.com>.

Hi Devraj,

I attached the files so that it is easier for anyone to run it and simulate
the issue. There are no other files required.

following are the logs from the jobtracker and the tasktracker

*JobTracker*

2011-06-23 12:46:48,781 DEBUG org.apache.hadoop.mapred.JobTracker: Per-Task
memory configuration is not set on JT. Not checking the job for invalid
memory requirements.
2011-06-23 12:46:48,783 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201106231235_0001
2011-06-23 12:46:48,783 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201106231235_0001
2011-06-23 12:46:48,872 INFO org.apache.hadoop.mapred.JobInProgress: Input
size for job job_201106231235_0001 = 0. Number of splits = 1
2011-06-23 12:46:49,132 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER2 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 183
2011-06-23 12:46:49,157 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201106231235_0001_m_000002_0' to tip
task_201106231235_0001_m_000002, for tracker 'TASK_TRACKER2'
2011-06-23 12:46:49,158 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER2 -> LaunchTask: attempt_201106231235_0001_m_000002_0
2011-06-23 12:46:50,943 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER1 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 159
2011-06-23 12:46:52,203 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER2 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 184
2011-06-23 12:46:52,204 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201106231235_0001_m_000002_0' has completed
task_201106231235_0001_m_000002 successfully.
2011-06-23 12:46:52,208 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201106231235_0001_m_000000_0' to tip
task_201106231235_0001_m_000000, for tracker 'TASK_TRACKER2'
2011-06-23 12:46:52,210 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER2 -> LaunchTask: attempt_201106231235_0001_m_000000_0
2011-06-23 12:46:52,211 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER2 -> KillTaskAction: attempt_201106231235_0001_m_000002_0

12:46:53,308 DEBUG org.apache.hadoop.mapred.JobTracker: Per-Task memory
configuration is not set on JT. Not checking the job for invalid memory
requirements.
2011-06-23 12:46:53,309 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201106231235_0002
2011-06-23 12:46:53,309 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201106231235_0002
2011-06-23 12:46:53,380 INFO org.apache.hadoop.mapred.JobInProgress: Input
size for job job_201106231235_0002 = 0. Number of splits = 1
2011-06-23 12:46:53,946 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER1 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 160
2011-06-23 12:46:53,947 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201106231235_0002_m_000002_0' to tip
task_201106231235_0002_m_000002, for tracker 'TASK_TRACKER1'
2011-06-23 12:46:53,947 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER2 -> LaunchTask: attempt_201106231235_0002_m_000002_0
2011-06-23 12:46:55,215 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER2 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 185
2011-06-23 12:46:56,989 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER1 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 161
2011-06-23 12:46:57,042 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201106231235_0002_m_000002_0' has completed
task_201106231235_0002_m_000002 successfully.
2011-06-23 12:46:57,044 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_201106231235_0002_m_000000_0' to tip
task_201106231235_0002_m_000000, for tracker 'TASK_TRACKER1'
2011-06-23 12:46:57,044 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER1 -> LaunchTask: attempt_201106231235_0002_m_000000_0
2011-06-23 12:46:57,044 DEBUG org.apache.hadoop.mapred.JobTracker:
TASK_TRACKER1 -> KillTaskAction: attempt_201106231235_0002_m_000002_0
2011-06-23 12:46:58,219 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER2 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 186
2011-06-23 12:47:00,049 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER1 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 162
2011-06-23 12:47:00,049 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201106231235_0002_m_000000_0: java.lang.RuntimeException:
Throwing own exception
    at com.test.MyMapper.map(MyMapper.java:26)
    at com.test.MyMapper.map(MyMapper.java:1)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

2011-06-23 12:47:01,222 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER2 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 187
2011-06-23 12:47:03,052 DEBUG org.apache.hadoop.mapred.JobTracker: Got
heartbeat from: TASK_TRACKER1 (restarted: false initialContact: false
acceptNewTasks: true) with responseId: 163
2011-06-23 12:47:03,053 DEBUG org.apache.hadoop.mapred.JobTracker: Marked
'attempt_201106231235_0002_m_000000_0' from 'TASK_TRACKER1'
2011-06-23 12:47:03,054 DEBUG org.apache.hadoop.mapred.JobTracker: Removing
task 'attempt_201106231235_0002_m_000000_0'
2011-06-23 12:47:03,054 INFO org.apache.hadoop.mapred.JobTracker: Removed
completed task 'attempt_201106231235_0002_m_000000_0' from 'TASK_TRACKER1'

Thanks
Sudhan S

On Wed, Jun 22, 2011 at 12:13 PM, Devaraj K <de...@huawei.com> wrote:

>  With this info it is difficult to find out where the problem is coming.
> Can you check the job tracker and task tracker logs related to these jobs?
> ****
>
> ** **
>
> Devaraj K ****
>   ------------------------------
>
> *From:* Sudharsan Sampath [mailto:sudhan65@gmail.com]
> *Sent:* Wednesday, June 22, 2011 11:51 AM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Map job hangs indefinitely****
>
> ** **
>
> Hi,
>
> I am starting a job from the map of another job. Following are quick mock
> of the code snippets that I use. But the 2nd job hangs indefinitely after
> the 1st task attempt fails. There is not even a 2nd attempt. This runs fine
> on a cluster with one node but fails on a two node cluster.
>
> Can someone help me in understanding why the failed attempt was unable to
> be rescheduled and thereby hangs the job.
>
> Thanks
> Sudhan S
>
>
>
> ****
>

RE: Map job hangs indefinitely

Posted by Devaraj K <de...@huawei.com>.

With this info it is difficult to find out where the problem is coming. Can
you check the job tracker and task tracker logs related to these jobs? 

 

Devaraj K 

  _____  

From: Sudharsan Sampath [mailto:sudhan65@gmail.com] 
Sent: Wednesday, June 22, 2011 11:51 AM
To: mapreduce-user@hadoop.apache.org
Subject: Map job hangs indefinitely

 

Hi,

I am starting a job from the map of another job. Following are quick mock of
the code snippets that I use. But the 2nd job hangs indefinitely after the
1st task attempt fails. There is not even a 2nd attempt. This runs fine on a
cluster with one node but fails on a two node cluster.

Can someone help me in understanding why the failed attempt was unable to be
rescheduled and thereby hangs the job.

Thanks
Sudhan S