You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sharad Agarwal (JIRA)" <ji...@apache.org> on 2011/07/19 06:47:58 UTC

[jira] [Created] (MAPREDUCE-2708) Design and implement MR Application Master recovery

Design and implement MR Application Master recovery
---------------------------------------------------

                 Key: MAPREDUCE-2708
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
            Reporter: Sharad Agarwal
            Assignee: Sharad Agarwal


Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-2708:
-------------------------------------

    Priority: Blocker  (was: Major)

> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated MAPREDUCE-2708:
--------------------------------------

    Attachment: mr2708_v1.patch

Apply this patch on top of the latest patch from MAPREDUCE-2702.

> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated MAPREDUCE-2708:
--------------------------------------

    Status: Patch Available  (was: Open)
    
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134051#comment-13134051 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #870 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/870/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188043
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133917#comment-13133917 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1156 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1156/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188043
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Issue Type: Sub-task  (was: New Feature)
        Parent: MAPREDUCE-2692
    
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134029#comment-13134029 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #49 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/49/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.
svn merge -c r1188043 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188044
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133342#comment-13133342 ] 

Hadoop QA commented on MAPREDUCE-2708:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12500319/MAPREDUCE-2708-20111022.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 1718 javac compiler warnings (more than the trunk's current 1705 warnings).

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1114//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1114//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1114//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1114//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1114//console

This message is automatically generated.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated MAPREDUCE-2708:
--------------------------------------

    Summary: [MR-279] Design and implement MR Application Master recovery  (was: Design and implement MR Application Master recovery)

> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133902#comment-13133902 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1219 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1219/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188043
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133904#comment-13133904 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1141 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1141/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188043
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Arun C Murthy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127310#comment-13127310 ] 

Arun C Murthy commented on MAPREDUCE-2708:
------------------------------------------

Thanks Sharad!
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Mahadev konar (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated MAPREDUCE-2708:
-------------------------------------

    Status: Open  (was: Patch Available)

looks like the patch doesnt apply anymore. Sharad could you please update the patch? Sorry for the delay in review/commit.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133507#comment-13133507 ] 

Mahadev konar commented on MAPREDUCE-2708:
------------------------------------------

 I tried the latest patch out on 0.23, it worked for me. I killed a randomwriter job at 83% map and 0% reduce and the AM restarted with 83% map and 0% reduce. Nicholas tells me that the error above could just be a bug in trunk due to HDFS-2181 or such. 

Vinod, are you using trunk or 0.23? If trunk, can you try it out with 0.23?

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127307#comment-13127307 ] 

Sharad Agarwal commented on MAPREDUCE-2708:
-------------------------------------------

Lot of conflict while merging. I will try to get this done in next couple of days. Thanks!
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133856#comment-13133856 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

I already tried with 0.23 a couple of times now. The exception trace changed a bit, but the outcome is the same. Digging my way through DFS.

bq. btw Vinod, with recovery module unable to parse the history file (due to hdfs bug), it should fall back to restarting the job. just curious, did you notice that? 
Yes, that code path is working. With the above errors, I am not able to see recovery in action, restart is working good now.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133901#comment-13133901 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #44 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/44/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.
svn merge -c r1188043 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188044
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134062#comment-13134062 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #841 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/841/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188043
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Amol Kekre (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120927#comment-13120927 ] 

Amol Kekre commented on MAPREDUCE-2708:
---------------------------------------

Can this jira be closed now?
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated MAPREDUCE-2708:
--------------------------------------

    Attachment: mr2708_v2.patch

Rebased to latest 23 branch. All hadoop-mapreduce-client passing.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132806#comment-13132806 ] 

Sharad Agarwal commented on MAPREDUCE-2708:
-------------------------------------------

bq. You write extremely beautiful tests
smile. Thanks Vinod for taking this up. hope testing on cluster goes smooth with this.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133919#comment-13133919 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

Oh, I realized I committed this without running it through Jenkins. Apologies. I already ran tests locally by that time, just ran them once again to be sure. They all pass.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133597#comment-13133597 ] 

Sharad Agarwal commented on MAPREDUCE-2708:
-------------------------------------------

bq. and the AM restarted with 83% map and 0% reduce

great! it worked.

btw Vinod, with recovery module unable to parse the history file (due to hdfs bug), it should fall back to restarting the job. just curious, did you notice that? 
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133508#comment-13133508 ] 

Mahadev konar commented on MAPREDUCE-2708:
------------------------------------------

Also, note that it was secure setup.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Tsz Wo (Nicholas), SZE (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133504#comment-13133504 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-2708:
---------------------------------------------------

> Now, if I kill the AM after a couple of tasks, NN shows the #bytes to be zero: ...

Since the file is not closed, the size in NN won't be updated.  We have to call getVisibleLength() to get the length from one of the datanodes, i.e.
{code}
final DFSDataInputStream in = (DFSDataInputStream)fs.open(p);
finel long currentFileSize = in.getVisibleLength();
{code}

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120994#comment-13120994 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

bq. Can this jira be closed now?
No, the patch uploaded by Sharad on this ticket still needs to be updated to the latest code and tested.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128689#comment-13128689 ] 

Sharad Agarwal commented on MAPREDUCE-2708:
-------------------------------------------

bq. All hadoop-mapreduce-client passing.

correction: All hadoop-mapreduce-client tests passing.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129550#comment-13129550 ] 

Hadoop QA commented on MAPREDUCE-2708:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12499261/mr2708_v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1052//console

This message is automatically generated.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133900#comment-13133900 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #45 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/45/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.
svn merge -c r1188043 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188044
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

          Component/s: applicationmaster
    Affects Version/s: 0.23.0
        Fix Version/s: 0.23.0
    
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: mr2708_v1.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133426#comment-13133426 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

I hit some kind of a blocker here:

A normally finishing jobhistory file for my small job (with 6 maps of 1min sleep each) is 60KB:
bq. -rw-rw----   3 nobody rm      60207 2011-10-22 21:20 /job-history-root/history/done/2011/10/22/000000/job_1319280146725_0003-1319298340296-nobody-Sleep+job-1319298659124-6-1-SUCCEEDED.jhist

Now, if I kill the AM after a couple of tasks, NN shows the #bytes to be zero:
bq. -rw-r--r--   3 nobody supergroup          0 2011-10-22 21:15 /user/nobody/staging1234/nobody/.staging/job_1319280146725_0003_1.jhist

And either when new generation AM tries to read this file for recovery or if I manually try to read this via dfs command, it errs out:
{quote}
11/10/22 21:30:31 DEBUG ipc.Client: closing ipc connection to /127.0.0.1:50020: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:535)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
        at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:499)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
        at org.apache.hadoop.ipc.Client.call(Client.java:1065)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:244)
        at $Proxy10.getReplicaVisibleLength(Unknown Source)
        at org.apache.hadoop.hdfs.protocolR23Compatible.ClientDatanodeProtocolTranslatorR23.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorR23.java:121)
        at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:163)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:140)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:111)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:569)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:235)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:585)
        at org.apache.hadoop.fs.shell.Display$Cat.getInputStream(Display.java:93)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:81)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:300)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:272)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:239)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:185)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:149)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:254)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:296)
....
11/10/22 21:30:31 ERROR ipc.RPC: Tried to call RPC.stopProxy on an object that is not a proxy.
java.lang.IllegalArgumentException: not a proxy instance
        at java.lang.reflect.Proxy.getInvocationHandler(Proxy.java:637)
        at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:479)
        at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:183)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:140)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:111)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:569)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:235)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:585)
        at org.apache.hadoop.fs.shell.Display$Cat.getInputStream(Display.java:93)
        at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:81)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:300)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:272)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:255)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:239)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:185)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:149)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:254)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:296)
11/10/22 21:30:31 ERROR ipc.RPC: Could not get invocation handler null for proxy class class org.apache.hadoop.hdfs.protocolR23Compatible.ClientDatanodeProtocolTranslatorR23, or invocation handler is not closeable.
cat: Cannot obtain block length for LocatedBlock[BP-995821427-127.0.0.1-1318832709756:blk_-7812123742502704244_1249; getBlockSize()=0; corrupt=false; offset=0; locs=[127.0.0.1:999]
{quote}

So, looks like we are in a fix if the job-history file is of a single block size and that block isn't complete yet. I could try with a small block size say 25-30K for the jobhistory file, but is that okay for running on clusters?

Sharad?
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Attachment: MAPREDUCE-2708-20111021.1.txt

Compilation passes with this patch. Still fixing tests.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Status: Open  (was: Patch Available)
    
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Attachment: MAPREDUCE-2708-20111021.txt

*sigh* Downside of the great velocity for 0.23. The patch needed a *lot* of effort to upmerge again. I did it anyways, given you are finding it hard to get some time. Fixed a couple of trivial bugs along.

Sharad, I can take it from here - testing on cluster etc before commit.

The patch looks good overall. +1. We just need to replace the Application_AttemptID_Env with ContainerId_Env to be consistent with AMContainer, but I'd like to do that separately. Don't want this patch to go stale again given my experience with the update.

Oh and you sir! You write extremely beautiful tests. Elated, thanks!
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-2708.
------------------------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed

I just committed this to trunk and branch-0.23. Thanks Sharad!
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134055#comment-13134055 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #61 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/61/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.
svn merge -c r1188043 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188044
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133893#comment-13133893 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2708:
----------------------------------------------------

Okay. I've finally cornered this. That was a wild-goose-chase. And tiresome work.

It is actually not related to trunk or 0.23. By default {{dfs.block.access.token.enable}} is set to false, and so clients weren't able to contact datanodes when they need something like an incomplete block's length. The error goes away when I set this explicitly. But I don't think this should be needed if we enabled {{hadoop.security.authentication}} already, will file a DFS ticket and see if there is a reason why they did that other than having a quick flag to disable to the feature.

Apart from that, there are couple of other bugs related to recovery, client keep reconnecting to the new AM again and again, jobs with reduces fail their reduces after restart etc. Will fix them separately.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) Design and implement MR Application Master recovery

Posted by "Sharad Agarwal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sharad Agarwal updated MAPREDUCE-2708:
--------------------------------------

    Component/s: mrv2

> Design and implement MR Application Master recovery
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133914#comment-13133914 ] 

Hudson commented on MAPREDUCE-2708:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #44 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/44/])
    MAPREDUCE-2708. Designed and implemented MR Application Master recovery to make MR AMs resume their progress after restart. Contributed by Sharad Agarwal.
svn merge -c r1188043 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1188044
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/recover/RecoveryService.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/MRApp.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRecovery.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventWriter.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.java

                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2708) [MR-279] Design and implement MR Application Master recovery

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-2708:
-----------------------------------------------

    Attachment: MAPREDUCE-2708-20111022.txt

MAPREDUCE-3233 is in. Updating patch to fix tests.

Validation on single node secure setup next.
                
> [MR-279] Design and implement MR Application Master recovery
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-2708
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2708-20111021.1.txt, MAPREDUCE-2708-20111021.txt, MAPREDUCE-2708-20111022.txt, mr2708_v1.patch, mr2708_v2.patch
>
>
> Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira