You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Rajiv Chittajallu (JIRA)" <ji...@apache.org> on 2008/07/08 15:48:31 UTC

[jira] Created: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

broken symlinks in jobcache when local tasks are done but job is in progress
----------------------------------------------------------------------------

                 Key: HADOOP-3713
                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.17.0
            Reporter: Rajiv Chittajallu


When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with

 2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
 2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
 java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
 at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
        at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
        at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
       at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)

$  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
$  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory

Earlier tasks scheduled on this tasktracker have completed successfully

2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-3713:
--------------------------------

    Fix Version/s:     (was: 0.17.2)

"resolved as duplicate" means that "fixed version" should not be set.

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Priority: Blocker
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3713:
--------------------------------

    Priority: Blocker  (was: Major)
    Assignee: Amareshwari Sriramadasu

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611901#action_12611901 ] 

Devaraj Das commented on HADOOP-3713:
-------------------------------------

+1 to applying 3370 on 17. 

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3713:
--------------------------------

    Fix Version/s:     (was: 0.18.0)
                   0.17.2

Marking this for 0.17.2 upon Rajiv's request

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3713:
--------------------------------

    Fix Version/s: 0.18.0

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved HADOOP-3713.
-----------------------------------

    Resolution: Duplicate
      Assignee:     (was: Amareshwari Sriramadasu)

Fixed by HADOOP-3370.

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611983#action_12611983 ] 

Amareshwari Sriramadasu commented on HADOOP-3713:
-------------------------------------------------

bq. because the job directory is not cleanedup properly
Job directory should not get cleanedup before the job completes because the shared scratch space will be lost.

I uploaded the patch for HADOOP-3370 for the branch 0.17.  And also the patch is tested. 

> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3713) broken symlinks in jobcache when local tasks are done but job is in progress

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611900#action_12611900 ] 

Amareshwari Sriramadasu commented on HADOOP-3713:
-------------------------------------------------

On branch 0.17, when a task gets a KILLTASKACTION and the tasktracker doesnt have anymore tasks running for that job, the job is removed from runningJobs data structure. So, when it gets a task for that job again, it does localize again, which is causing the Mkdirs to fail, because the job directory is not cleanedup properly. HADOOP-3370 has fixed the problem on 0.18.

I think HADOOP-3370 should be also be applied for 0.17. That will solve this issue.
Still, there could be garbage left on the jobcache, which is addressed by HADOOP-3386.


> broken symlinks in jobcache when local tasks are done but job is in progress
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-3713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Rajiv Chittajallu
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>
> When all running tasks on a tasktracker are done, not all links for  /<mapred.local.dir>/taskTracker/jobcache/<job>/work are deleted. This is resulting in new tasks from the same job scheduled on this node to fail with
>  2008-07-07 17:44:49,756 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200807071715_0022_r_000295_0
>  2008-07-07 17:44:49,773 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing task_200807071715_0022_r_000295_0:
>  java.io.IOException: Mkdirs failed to create /tmp3/taskTracker/jobcache/job_200807071715_0022/work
>  at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:680)
>         at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1274)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:915)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1310)
>        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2251)
> $  ls -lt /tmp3/taskTracker/jobcache/job_200807071715_0022/work
> lrwxrwxrwx 1 user users 135 Jul  7 17:44 /tmp3/taskTracker/jobcache/job_200807071715_0022/work -> /tmp0/taskTracker/jobcache/job_200807071715_0022/work
> $  ls -lt /tmp0/mapred-local/taskTracker/jobcache/job_200807071715_0022/work
> ls: /tmp0/taskTracker/jobcache/job_200807071715_0022/work: No such file or directory
> Earlier tasks scheduled on this tasktracker have completed successfully
> 2008-07-07 17:44:44,926 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000004_0 done; removing files.
> 2008-07-07 17:44:44,931 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000176_0 done; removing files.
> 2008-07-07 17:44:44,958 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000210_0 done; removing files.
> 2008-07-07 17:44:49,486 INFO org.apache.hadoop.mapred.TaskRunner: task_200807071715_0022_r_000153_0 done; removing files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.