You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2007/06/08 01:40:26 UTC

[jira] Updated: (HADOOP-1475) local filecache disappears

     [ https://issues.apache.org/jira/browse/HADOOP-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Kunz updated HADOOP-1475:
-----------------------------------

    Component/s: mapred

> local filecache disappears
> --------------------------
>
>                 Key: HADOOP-1475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1475
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Christian Kunz
>            Priority: Blocker
>
> All our jobs on a 600 node cluster fail. Symptom is that the local filecache disappears.
> It might have to do with the fact that lost task trackers get re-initialized when they send a heartbeat again, and purge the local directory completely without updating the filecache.
> Side issue is;
> why do we get so many lost tasktrackers which then resume the heartbeat (a kind of 'bogus' lost tasktracker)?. We lost tasktrackers:
> 13 in the 1st hour of the job
> 18 in the 2nd hour
> 33 in the 3rd hour
> Then the job failed.
> E.g. all the tasktrackers lost in the first 2 hours of the job got logged sometime later with a 'Status from unknown Tracker' in the jobtracker log and got reinitialized.
> I attach some jobracker log messages showing how the heartbeat of the lost tasktrackers come in late, sometimes less than 1 minute late, sometimes up to 16 minutes. What could be the reason? Do the heartbeats get lost? 
> 2007-06-07 13:09:08,518 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_070
> 2007-06-07 13:09:48,919 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_070
> 2007-06-07 13:39:08,740 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_075
> 2007-06-07 13:41:50,810 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_075
> 2007-06-07 14:32:29,093 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_082
> 2007-06-07 14:35:34,217 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_082
> 2007-06-07 14:15:48,856 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_085
> 2007-06-07 14:20:21,337 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_085
> 2007-06-07 15:25:49,524 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_098
> 2007-06-07 15:33:56,732 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_098
> 2007-06-07 14:49:09,203 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_106
> 2007-06-07 14:54:25,538 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_106
> 2007-06-07 15:02:29,337 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_108
> 2007-06-07 15:02:57,558 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_108
> 2007-06-07 14:19:09,022 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_112
> 2007-06-07 14:19:15,273 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_112
> 2007-06-07 14:19:08,881 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_114
> 2007-06-07 14:30:03,354 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_114
> 2007-06-07 15:42:29,579 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_116
> 2007-06-07 15:43:06,422 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_116
> 2007-06-07 14:55:49,280 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_117
> 2007-06-07 14:56:38,452 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_117
> 2007-06-07 15:15:49,461 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_120
> 2007-06-07 15:31:37,028 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_120
> 2007-06-07 15:09:09,435 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_174
> 2007-06-07 15:18:31,254 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker : tracker_174

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.