You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Mayank Bansal (Created) (JIRA)" <ji...@apache.org> on 2012/02/08 00:40:59 UTC

[jira] [Created] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
--------------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-3837
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.22.0
            Reporter: Mayank Bansal
            Assignee: Mayank Bansal


If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.

However the current behavior is as follows
jobtracker try to restore the jobs but it can not . And after that jobtracker closes it handle to hdfs and nobody else also can not submit the job. 

Thanks,
Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-HADOOP-1-MAPREDUCE-3837-2.patch

Incorporating review comments
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White resolved MAPREDUCE-3837.
----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 1.1.1)
                   1.2.0

I just committed this to branch-1. Thanks Mayank!
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.2.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207663#comment-13207663 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #955 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/955/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243695)

     Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207225#comment-13207225 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #534 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/534/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243698)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Affects Version/s: 1.1.1
        Fix Version/s: 1.1.1
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.1.1, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404251#comment-13404251 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Thanks Tom for your comments. I incorporated everything except below point 
bq. If there is no need for restart count anymore - since jobs are re-run from the beginning each time - then would it be cleaner to remove it entirely?
Yeah you are right and we should cleanup the restart count, However it looks to me it needs to be looked at more closely and more testing required. Do you mind If I open a separate JIRA and work on that separately then this JIRA?

Rest of the comments are incorporated in my latest patch.

Thanks,
Mayank

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221053#comment-13221053 ] 

Alejandro Abdelnur commented on MAPREDUCE-3837:
-----------------------------------------------

Mayank,

* Built branch-1 with your patch
* Configured the cluster, run a job test is OK
* Configured the mapred-site.xml with 'mapred.jobtracker.restart.recover=true'
* Restarted the JT
* Created a IN data file in my HDFS home dir
* Submitted 5 wordcount jobs

{code}
bin/hadoop jar hadoop-*examples*jar wordcount IN OUT0 &
bin/hadoop jar hadoop-*examples*jar wordcount IN OUT1 &
bin/hadoop jar hadoop-*examples*jar wordcount IN OUT2 &
bin/hadoop jar hadoop-*examples*jar wordcount IN OUT3 &
bin/hadoop jar hadoop-*examples*jar wordcount IN OUT4 &
{code}

* Waited till they are all running
* Killed the JT
* Restarted the JT

The jobs are not recovered, and what I see in the logs is:

{code}
2012-03-02 08:55:22,164 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_201203020852_0001. Deleting it!!
2012-03-02 08:55:22,194 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_201203020852_0002. Deleting it!!
2012-03-02 08:55:22,204 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_201203020852_0003. Deleting it!!
2012-03-02 08:55:22,224 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_201203020852_0004. Deleting it!!
2012-03-02 08:55:22,236 INFO org.apache.hadoop.mapred.JobTracker: Found an incomplete job directory job_201203020852_0005. Deleting it!!
{code}

Am I missing some additional configuration?

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207198#comment-13207198 ] 

Konstantin Shvachko commented on MAPREDUCE-3837:
------------------------------------------------

+1 The patch looks good. It enables an important feature of automatic job recovery on JT startup.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-HADOOP-1-MAPREDUCE-3837-3.patch
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205008#comment-13205008 ] 

Hadoop QA commented on MAPREDUCE-3837:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12514029/PATCH-TRUNK-MAPREDUCE-3837.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these unit tests:
                  org.apache.hadoop.yarn.util.TestLinuxResourceCalculatorPlugin

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1832//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1832//console

This message is automatically generated.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405315#comment-13405315 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Test Patch Results are as follows:

 [exec] BUILD SUCCESSFUL
     [exec] Total time: 4 minutes 7 seconds
     [exec] 
     [exec] 
     [exec] 
     [exec] 
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
     [exec] 
     [exec] 
     [exec] 
     [exec] 
     [exec] ======================================================================
     [exec] ======================================================================
     [exec]     Finished build.
     [exec] ======================================================================
     [exec] ======================================================================

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.1.1, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Konstantin Shvachko (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221344#comment-13221344 ] 

Konstantin Shvachko commented on MAPREDUCE-3837:
------------------------------------------------

I've been reviewing this patch, and have a couple of cosmetic comments below.
I agree with Alejandro. This is not introducing new feature, it is just enabling already existing feature. There is low risk, since the feature is enabled in a restricted context, that is restarting failed jobs from scratch rather than trying to continue from the point they were terminated.
The patch seems to be larger than it actually is, because it is removing the [troubled] logic responsible for resurrecting the job from its history. Besides that it is simple. Take a look, Arun.

Cosmetic comments
- Several lines are too long
- See several tabs - should be spaces
- indentation is wrong in couple of places
          recoveryManager.addJobForRecovery(JobID.forName(fileName));
          shouldRecover = true; // enable actual recovery if num-files > 1
- Add spaces after commas in method calls and parameters
Otherwise it looks good. 
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated MAPREDUCE-3837:
----------------------------------

    Description: 
If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.

However the current behavior is as follows
jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 

Thanks,
Mayank

  was:
If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.

However the current behavior is as follows
jobtracker try to restore the jobs but it can not . And after that jobtracker closes it handle to hdfs and nobody else also can not submit the job. 

Thanks,
Mayank

    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411512#comment-13411512 ] 

Arun C Murthy commented on MAPREDUCE-3837:
------------------------------------------

bq. Looks like this needs a minor update to get it to work on Mac OSX...

Could be any single-node cluster too...
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Issue Type: New Feature  (was: Bug)
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.2.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207229#comment-13207229 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1723 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1723/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243695)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400046#comment-13400046 ] 

Todd Lipcon commented on MAPREDUCE-3837:
----------------------------------------

Arun: I noticed this is listed as one of the patches in HDP. Does that imply that you're removing your -1? Or do you have a new patch that you're shipping in your product that you haven't open-sourced yet?
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405094#comment-13405094 ] 

Tom White commented on MAPREDUCE-3837:
--------------------------------------

+1 to the latest patch - thanks for addressing my feedback Mayank. Can you run test-patch and the unit test if you haven't already please.

Cleaning up the restart count code in a separate JIRA is fine by me.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.1.1, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401733#comment-13401733 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Hi Arun,

As suggested by you

1) I added the credentials to resubmit api.
2) I added the isJobdirvalid api as well.
3) my patch already uses jobid instead of jobinfo so no change required.


Hi Tom,

I added the new test case and fixed the recoverymanager test case well in the latest patch.

I fixed one more issue in terms of recovery which i found here in production.

Please review the patch.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207670#comment-13207670 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #168 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/168/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243698)

     Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221253#comment-13221253 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Hi Alejandro

Thanks for your help testing this patch, I am really sorry about confusion as I missed one function in the patch.  I have attached the new patch , tested it and it is working fine in my local environment. I am not sure how I missed that before.

Please let me know if you find any more issues with that.

Arun,

I believe the issues were in terms of recovering the jobs from the point they crashed. Here what I am doing is very simplistic approach. I am reading the job token file and resubmitting the jobs in case of crash and recover. I am not trying to recover from the point it left from the last run.

In this scenario it is a new run of the job and works well. The downside is the whole job will re run however the upside is Users don't need to resubmit the jobs.

Please let me know your thoughts.

Thanks,
Mayank 
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397881#comment-13397881 ] 

Tom White commented on MAPREDUCE-3837:
--------------------------------------

TestRecoveryManager and TestJobTrackerRestartWithLostTracker failed for me with this patch. Mayank - can you update them for this JIRA please?
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411620#comment-13411620 ] 

Tom White commented on MAPREDUCE-3837:
--------------------------------------

+1 to the fix. FWIW I didn't see this when testing on a single-node cluster (on Mac OS X).
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207716#comment-13207716 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #990 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/990/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243695)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228016#comment-13228016 ] 

Arun C Murthy commented on MAPREDUCE-3837:
------------------------------------------

Apologies for the late response, I missed this.

Thanks for the clarification Mayank, Tucu & Konst. I agree it's much more palatable without all the complexities of trying to recover jobs from point-of-crash.

Couple of questions:
a) How does it work in a secure setting?
b) We should at least add some docs on this feature.

Makes sense?
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400901#comment-13400901 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Agree, working on it will update soon.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207227#comment-13207227 ] 

Mahadev konar commented on MAPREDUCE-3837:
------------------------------------------

@Mayank,
 You should Grant license to Apache when uploading patches.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3837:
-------------------------------------

    Fix Version/s:     (was: 0.23.2)
                       (was: 0.24.0)
    
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3837:
-------------------------------------

    Fix Version/s:     (was: 1.2.0)
                   1.1.0

I just merged this to branch-1.1 after Matt's go ahead.
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.1.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217460#comment-13217460 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Attached the patch for Hadoop -1, please review that.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400808#comment-13400808 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Hi Tom,

I just took the latest 1.1 code base and ran the two testcases which you mentioned abobe, without my patch and they are still failing.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207221#comment-13207221 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1797 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1797/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243695)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reopened MAPREDUCE-3837:
--------------------------------------


Looks like this needs a minor update to get it to work on Mac OSX...
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.2.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-HADOOP-1-MAPREDUCE-3837.patch
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400894#comment-13400894 ] 

Tom White commented on MAPREDUCE-3837:
--------------------------------------

Mayank - thanks for pointing that out. I just tried and they fail for me on the latest branch-1 code too. We do need tests for job tracker recovery though, so they should be fixed to ensure that the code in this patch is tested and doesn't regress, don't you think?
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400049#comment-13400049 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Hi Todd,

Arun gave -1 because he was in impression that I m trying to restore the state however when I explained it is not restore it is resubmit then he was OK.

What Arun told me more or less the patch is the same in HDP but one bug fix which he did.

I will update the patch based on Tom's comment.

Arun can you also put the bug fix which you did ?

Thanks,
Mayank

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207222#comment-13207222 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #546 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/546/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243698)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207259#comment-13207259 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #550 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/550/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243698)

     Result = ABORTED
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal reopened MAPREDUCE-3837:
--------------------------------------


For Haddop-1 Patch
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204944#comment-13204944 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

PATCH-MAPREDUCE-3837.patch

this one is for 22 branch. Please review that. Shortly I will be putting the same for trunk as well.

Thanks,
Mayank

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-MAPREDUCE-3837.patch
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402503#comment-13402503 ] 

Tom White commented on MAPREDUCE-3837:
--------------------------------------

Mayank - thanks for the changes. Here's my feedback:

* If there is no need for restart count anymore - since jobs are re-run from the beginning each time - then would it be cleaner to remove it entirely?
* In JobTracker you changed "shouldRecover = false;" to "shouldRecover = true;" without updating the comment on the line before. (This might be related to the previous point about not having restart counts.)
* Remove the @Ignore annotation from TestRecoveryManager and the comment about MAPREDUCE-873.
* The new test testJobresubmission (should be testJobResubmission) should test that the job succeeded after the restart. Also, there's no reason to run it as a high-priority job.
* There's a comment saying it is a "faulty job" - which it isn't.
* Have setUp and tearDown methods to start and stop the cluster. At the moment there is code duplication, and clusters won't be shut down cleanly on failure.
* testJobTracker would be better named testJobTrackerRestartsWithMissingJobFile
* testRecoveryManager would be better named testJobTrackerRestartWithBadJobs
* There are multiple typos and formatting errors (including indentation, which should be 2 spaces) in the new code. See Konstantin's comment above.
* TestJobTrackerRestartWithLostTracker still fails, as does TestJobTrackerSafeMode. These should be fixed as a part of this work.

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-HADOOP-1-MAPREDUCE-3837-4.patch

Attaching latest patch after incorporating Tom's comments.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Summary: Job tracker is not able to recover job in case of crash and after that no user can submit job.  (was: Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.)
    
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.2.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207262#comment-13207262 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1734 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1734/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243695)

     Result = ABORTED
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243695
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207429#comment-13207429 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Mapreduce-22-branch #100 (See [https://builds.apache.org/job/Hadoop-Mapreduce-22-branch/100/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243700)

     Result = SUCCESS
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243700
Files : 
* /hadoop/common/branches/branch-0.22/mapreduce/CHANGES.txt
* /hadoop/common/branches/branch-0.22/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Target Version/s: 0.24.0, 0.22.1
              Status: Patch Available  (was: Open)
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Konstantin Shvachko (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated MAPREDUCE-3837:
-------------------------------------------

          Resolution: Fixed
       Fix Version/s: 0.23.2
                      0.22.1
                      0.24.0
    Target Version/s: 0.24.0, 0.22.1  (was: 0.22.1, 0.24.0)
        Hadoop Flags: Reviewed
              Status: Resolved  (was: Patch Available)

I just committed this. Thank you Mayank.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405336#comment-13405336 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

I just now completed commit-tests successfully.
I ran all unit test previously before attaching the patch those as well completed successfully.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 1.1.1, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221146#comment-13221146 ] 

Arun C Murthy commented on MAPREDUCE-3837:
------------------------------------------

-1 on committing to branch-1. We've had innumerable issues with this before, not a good idea for a stable branch.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207265#comment-13207265 ] 

Hudson commented on MAPREDUCE-3837:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #195 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/195/])
    MAPREDUCE-3837. Job tracker is not able to recover jobs after crash. Contributed by Mayank Bansal. (Revision 1243698)

     Result = FAILURE
shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1243698
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java/org/apache/hadoop/mapred/JobTracker.java

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400918#comment-13400918 ] 

Arun C Murthy commented on MAPREDUCE-3837:
------------------------------------------

Mayank, as we briefly discussed you'll need to fix the re-submit to read jobtokens from HDFS and pass them along (i.e. Credentials object) to the submitJob api. Sorry, I've been traveling a lot and missed commenting here, my bad.

Other nits:

# You've removed the call to JobClient.isJobDirValid which is dangerous. Since the contents have changed in hadoop-1 post security, please add a private isJobDirValid method to the JT and use it. This method should check for jobInfo file on HDFS (JobTracker.JOB_INFO_FILE) and the jobTokens file (TokenCache.JOB_TOKEN_HDFS_FILE).
# Also, since we only care about jobIds now for JT recovery, it's better to add a Set<JobId> jobIdsToRecover rather than rely on Set<JobInfo> jobsToRecover. This way we can avoid all the unnecessary translations b/w o.a.h.mapred.JobId and o.a.h.mapreduce.JobId.
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228571#comment-13228571 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Thanks Arun for your reply.

a) It reads the user id from the job token stored into the system directory and submits the job as that user, so the actual job runs as that user.
b) Yeah you are right, I will add the documentation and append it to the patch.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3837:
-------------------------------------

    Attachment: MAPREDUCE-3837_addendum.patch

I see this on a single node cluster.

Without this patch, tasks which are re-run fail with:

{noformat}

2012-07-11 05:43:18,299 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201207110542_0001_m_000000_0: java.lang.Throwable: Child Error
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Creation of /tmp/hadoop-acmurthy/mapred/local/userlogs/job_201207110542_0001/attempt_201207110542_0001_m_000000_0 failed.
	at org.apache.hadoop.mapred.TaskLog.createTaskAttemptLogDir(TaskLog.java:104)
	at org.apache.hadoop.mapred.DefaultTaskController.createLogDir(DefaultTaskController.java:71)
	at org.apache.hadoop.mapred.TaskRunner.prepareLogFiles(TaskRunner.java:316)
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:228)
{noformat}



The problem is that mkdirs (at least on mac-osx) returns false if the directory exists and wasn't created during the call. 

Straight-fwd patch to check for existence fixes it.
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411624#comment-13411624 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

Even I did not see this when testing to my single node cluster on MAC OSX, however fiz looks good to me.

+1 Thanks Arun.

Thanks,
Mayank
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398164#comment-13398164 ] 

Mayank Bansal commented on MAPREDUCE-3837:
------------------------------------------

When I put this patch it did not have this issue,Let me update the patch.
Thanks for finding this out.

Thanks,
Mayank
                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Alejandro Abdelnur (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221271#comment-13221271 ] 

Alejandro Abdelnur commented on MAPREDUCE-3837:
-----------------------------------------------

I've tested the last patch and works as expected. I'd agree with Mayank that this approach (rerun the full job) seems much less risky than the previous approach (rerun from where it was left).  Thus I'm good with the patch as it is much better than what currently is in. 

Arun, would you reconsider based on the explanation of what Mayank's patch does?

                
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 0.24.0, 0.22.1, 0.23.2
>
>         Attachments: PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MAPREDUCE-3837) Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-3837.
--------------------------------------

    Resolution: Fixed

Thanks for the reviews Tom & Mayank. I've just committed the small patch.
                
> Job tracker is not able to recover job in case of crash and after that no user can submit job.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.22.0, 1.1.1
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>             Fix For: 1.2.0, 0.22.1
>
>         Attachments: MAPREDUCE-3837_addendum.patch, PATCH-HADOOP-1-MAPREDUCE-3837-1.patch, PATCH-HADOOP-1-MAPREDUCE-3837-2.patch, PATCH-HADOOP-1-MAPREDUCE-3837-3.patch, PATCH-HADOOP-1-MAPREDUCE-3837-4.patch, PATCH-HADOOP-1-MAPREDUCE-3837.patch, PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3837) Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.

Posted by "Mayank Bansal (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Bansal updated MAPREDUCE-3837:
-------------------------------------

    Attachment: PATCH-TRUNK-MAPREDUCE-3837.patch
    
> Hadoop 22 Job tracker is not able to recover job in case of crash and after that no user can submit job.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3837
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3837
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Mayank Bansal
>            Assignee: Mayank Bansal
>         Attachments: PATCH-MAPREDUCE-3837.patch, PATCH-TRUNK-MAPREDUCE-3837.patch
>
>
> If job tracker is crashed while running , and there were some jobs are running , so if job tracker's property mapreduce.jobtracker.restart.recover is true then it should recover the job.
> However the current behavior is as follows
> jobtracker try to restore the jobs but it can not . And after that jobtracker closes its handle to hdfs and nobody else can submit job. 
> Thanks,
> Mayank

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira