You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2007/09/20 22:28:50 UTC

[jira] Created: (HADOOP-1930) Too many fetch-failures issue

Too many fetch-failures issue
-----------------------------

                 Key: HADOOP-1930
                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.15.0
            Reporter: Christian Kunz


A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.

>From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:

(In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)

grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:

2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529519 ] 

Arun C Murthy commented on HADOOP-1930:
---------------------------------------

So, thankfully, this just looks like a bug in the log-message. I'll rustle up a patch to fix this one...

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1930) Too many fetch-failures issue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1930:
----------------------------------

    Attachment: HADOOP-1930_1_20070922.patch

Ok, I take my previous comment back. 

It is a tad more involved since we are calling {{JobInProgress.failedTask}} from {{JobInProgress.fetchFailureNotification}} with the wrong {{trackerName}}. However, at worst it leads to a couple of trackers being wrongly blacklisted i.e. penalized for failed tasks.

Attached patch should fix it, I need to test this extensively...

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>         Attachments: HADOOP-1930_1_20070922.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1930) Too many fetch-failures issue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1930:
----------------------------------

    Fix Version/s: 0.15.0
           Status: Patch Available  (was: Open)

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1930_1_20070922.patch, HADOOP-1930_2_20070925.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1930) Too many fetch-failures issue

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1930:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Arun!

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1930_1_20070922.patch, HADOOP-1930_2_20070925.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529851 ] 

Devaraj Das commented on HADOOP-1930:
-------------------------------------

If a task is not found in any of the tasktrackers (getAssignedTracker returns null), then the patch declares the tasktracker as "unknown" in the message. From the readability point of view, it might make sense to declare the tasktracker as "lost" since that is the only case, after an earlier declaration by the JT that the task represented by that taskid was successful, when getAssignedTracker would return null.

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>         Attachments: HADOOP-1930_1_20070922.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530810 ] 

Hudson commented on HADOOP-1930:
--------------------------------

Integrated in Hadoop-Nightly #252 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/252/])

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1930_1_20070922.patch, HADOOP-1930_2_20070925.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1930) Too many fetch-failures issue

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1930:
----------------------------------

    Attachment: HADOOP-1930_2_20070925.patch

Thanks for the review Devaraj. In fact, as you pointed out, I've gone ahead and remove the {{hostname}} parameter completely since it wasn't being used anywhere...

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>         Attachments: HADOOP-1930_1_20070922.patch, HADOOP-1930_2_20070925.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530190 ] 

Hadoop QA commented on HADOOP-1930:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12366521/HADOOP-1930_2_20070925.patch
against trunk revision r579084.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/824/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/824/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/824/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/824/console

This message is automatically generated.

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1930_1_20070922.patch, HADOOP-1930_2_20070925.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1930) Too many fetch-failures issue

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sameer Paranjpye updated HADOOP-1930:
-------------------------------------

    Assignee: Arun C Murthy
    Priority: Blocker  (was: Major)

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.