You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/01/25 21:01:49 UTC

[jira] Created: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

TaskTracker sends duplicate status when updating task metrics throws exception
------------------------------------------------------------------------------

                 Key: HADOOP-934
                 URL: https://issues.apache.org/jira/browse/HADOOP-934
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.10.1
            Reporter: Nigel Daley


When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.

See annotated logs below

TaskTracker log:
2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
        at foo.Bar.setMetric(Unknown Source)
        at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
        at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
        at foo.Bar.setMetric(Unknown Source)
        at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
        at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
        at foo.Bar.setMetric(Unknown Source)
        at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
        at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
...
[these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]

JobTracker log:
2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
... [there are hundreds of these identical messages]
2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
...
[these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468359 ] 

Arun C Murthy commented on HADOOP-934:
--------------------------------------

Ok, this entails a simple fix; however what are everyone's view on how we handle the MetricsException? Perhaps log and ignore it? 
Thoughts?

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468944 ] 

Hadoop QA commented on HADOOP-934:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12349842/HADOOP-934_20070129_1.patch applied and successfully tested against trunk revision r501616.

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>             Fix For: 0.11.0
>
>         Attachments: HADOOP-934_20070129_1.patch
>
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-934:
------------------------------------

    Assignee: Arun C Murthy

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-934:
---------------------------------

    Attachment: HADOOP-934_20070129_1.patch

Simple fix which catches the MetricsException and just logs it... I'll rework it if someone has objections.

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>         Attachments: HADOOP-934_20070129_1.patch
>
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-934:
---------------------------------

    Fix Version/s: 0.11.0
           Status: Patch Available  (was: Open)

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>             Fix For: 0.11.0
>
>         Attachments: HADOOP-934_20070129_1.patch
>
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-934) TaskTracker sends duplicate status when updating task metrics throws exception

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-934:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Arun!

We should also probably figure out why that exception is thrown by metrics, since that looks like a bug, but we certainly shouldn't let a metrics exception mess things up this way, so I think this is a good patch.

> TaskTracker sends duplicate status when updating task metrics throws exception
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-934
>                 URL: https://issues.apache.org/jira/browse/HADOOP-934
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Nigel Daley
>         Assigned To: Arun C Murthy
>             Fix For: 0.11.0
>
>         Attachments: HADOOP-934_20070129_1.patch
>
>
> When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker.  This seems to get the JobTracker into a funny state.
> See annotated logs below
> TaskTracker log:
> 2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
> 2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
> 2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
> 2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
> 2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
> 2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
> 2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
> 2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
> 2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> 2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
> 2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
>         at foo.Bar.setMetric(Unknown Source)
>         at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
>         at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
> ...
> [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]
> JobTracker log:
> 2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
> 2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
> 2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
> 2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
> 2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
> 2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
> 2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> ... [there are hundreds of these identical messages]
> 2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
> 2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
> 2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
> 2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
> 2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
> 2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
> 2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> 2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem.  While updating status, cannot find taskid task_0001_m_000000_0
> ...
> [these messages continue forever]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.