You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/05/14 09:03:55 UTC

[jira] Created: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

the job directory of a failed task may stay forever on a tasktracker node
-------------------------------------------------------------------------

                 Key: HADOOP-3386
                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Zheng Shao


See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.

A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.

However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.

In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3386:
-------------------------------

    Status: Open  (was: Patch Available)

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HADOOP-3386:
-------------------------------

    Status: Patch Available  (was: Open)

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596669#action_12596669 ] 

Arun C Murthy commented on HADOOP-3386:
---------------------------------------

bq. When the job finished, JobTracker will send "KILL" job command to the TaskTrackers, based on jobsToTracker data structure.

+1

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-3386:
----------------------------------

    Comment: was deleted

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596654#action_12596654 ] 

dhruba borthakur commented on HADOOP-3386:
------------------------------------------

Do you have a proposal in mind to fix this behaviour?

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596668#action_12596668 ] 

Arun C Murthy commented on HADOOP-3386:
---------------------------------------

Zheng, I'm not so sure about the problem you are trying to fix, though I'd readily admit it's too hairy to keep all code-paths coherently in my head.

Just some pointers: TaskTracker.TaskInProgress.cleanup method is the one who deletes local dirs. That call does not solely depend on KillJobAction as I gather from this jira. It is called from TaskTracker.purgeTask and TaskTracker.TaskInProgress.taskFinished too, so please be aware of those.

Overall, as I pointed out in HADOOP-3370, I'd be very happy to have you implement what you propose - it is a very useful feature; I'm only asking you to be clear about the direction of this jira...

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596658#action_12596658 ] 

Zheng Shao commented on HADOOP-3386:
------------------------------------

I am thinking about taking step 2 and 3 from 3370 proposed solution. (Note: in the 3370 patch I only did 1 to make sure the critical fix can get out as soon as possible).

Proposed solution from 3370:
1. On failed task, remove the task from runningJobs, but do not delete runningJobs job entry even if it's the only task of the job; (which means we should NOT call TaskTracker.removeTaskFromJob)

2. JobTracker should keep another data structure: jobsToTracker, for recording all the TaskTrackers that a job has started a task on.

3. When the job finished, JobTracker will send "KILL" job command to the TaskTrackers, based on jobsToTracker data structure.



> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3386) the job directory of a failed task may stay forever on a tasktracker node

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596659#action_12596659 ] 

Hadoop QA commented on HADOOP-3386:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org
  against trunk revision 656122.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2461/console

This message is automatically generated.

> the job directory of a failed task may stay forever on a tasktracker node
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3386
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3386
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Zheng Shao
>
> See https://issues.apache.org/jira/browse/HADOOP-3370 for details of the problem.
> A tasktracker only cleans out the job dir when the job tracker sends a "KILLJOB" action in the heartbeat response message.
> However, in a corner case, the job tracker will NOT send the "KILLJOB" action to the task tracker. The case is when there is only failed tasks of this job on this task tracker; no successful tasks of this job is on this task tracker.
> In this case, jobtracker.trackerToTaskMap will not contain an entry of this task tracker to any tasks of this job. As a result, the job tracker will not send a KILLJOB action to the task tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.