You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/06/14 20:34:49 UTC

[jira] [Created] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

TT should fail task immediately if userlog dir cannot be created
----------------------------------------------------------------

                 Key: MAPREDUCE-2592
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.23.0
            Reporter: Todd Lipcon
             Fix For: 0.23.0


Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Harsh J (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2592:
-------------------------------

       Resolution: Won't Fix
    Fix Version/s:     (was: 0.22.1)
           Status: Resolved  (was: Patch Available)

Not an issue on 0.23/trunk.
                
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Esteban Gutierrez (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049336#comment-13049336 ] 

Esteban Gutierrez commented on MAPREDUCE-2592:
----------------------------------------------

The problem propagates very quickly to all the nodes after a single TaskTracker has reached that state and more jobs are submitted. This problem can bring down the whole cluster since all the TT will be blacklisted.

A sample stacktrace:

11/02/05 10:00:01 WARN mapred.JobClient: Error reading task outputhttp://dn:50060/tasklog?plaintext=true&taskid=attempt_201102050901_1000_m_000001_0&filter=stderr 
11/02/05 10:00:02 INFO mapred.JobClient: Task Id : attempt_201102050901_1000_m_000001_0, Status : FAILED 
java.lang.Throwable: Child Error 
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:471) 
Caused by: java.io.IOException: Task process exit with nonzero status of 1. 
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:458)



> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>             Fix For: 0.23.0
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2592:
-----------------------------------------------

    Affects Version/s:     (was: 0.23.0)
                       0.22.0
    
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.22.1
>
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J reassigned MAPREDUCE-2592:
----------------------------------

    Assignee: Harsh J

> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.23.0
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2592:
-----------------------------------------------

    Fix Version/s:     (was: 0.24.0)
                   0.22.1

Setting fix version to 0.22.1 as JT/TT are not supported in 0.23.*. Please revert back if you disagree.
                
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.22.1
>
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Harsh J (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2592:
-------------------------------

    Status: Patch Available  (was: Open)
    
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.24.0
>
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171859#comment-13171859 ] 

Hadoop QA commented on MAPREDUCE-2592:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483859/MAPREDUCE-2592.r1.diff
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1474//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1474//console

This message is automatically generated.
                
> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.24.0
>
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2592) TT should fail task immediately if userlog dir cannot be created

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2592:
-------------------------------

    Attachment: MAPREDUCE-2592.r1.diff

A WIP patch that makes the log creating method throw an IOE instead of simply logging a WARN.

No tests added yet. Would appreciate some help on writing one!

> TT should fail task immediately if userlog dir cannot be created
> ----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2592
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2592
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Harsh J
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2592.r1.diff
>
>
> Currently, TaskRunner will log the message "mkdirs failed. Ignoring" if it fails to mkdir the userlog directory for a task. Then, it goes on to spawn taskjvm.sh which tries to redirect output into the userlogs dir, thus failing with exit code 1. This leads to error messages that are very hard to diagnose ("task failed with exit status 1") in cases where the userlog directory has either become inaccessible or has reached the maximum number of dirents (32000 in ext3)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira