You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2012/06/08 09:20:23 UTC

[jira] [Created] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Arun C Murthy created MAPREDUCE-4328:
----------------------------------------

             Summary: Add the option to quiesce the JobTracker
                 Key: MAPREDUCE-4328
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv1
    Affects Versions: 1.0.3
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy


In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.

Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Target Version/s:   (was: 1.1.0)
    
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Fix Version/s:     (was: 1.2.0)
                   1.1.0

I merged this to branch-1.1 after talking to Matt.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.1.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439804#comment-13439804 ] 

Arun C Murthy commented on MAPREDUCE-4328:
------------------------------------------

Ah, I thought I responded to ATM, my bad.

As I've described in the description of the jira the primary use-case is to allow JobTracker to be resilient to NN failures (hardware or software).

I did think long and hard about doing this in YARN, but with HDFS-HA this use-case is pretty much non-existent. Furthermore, since YARN isn't tied to HDFS as MR1 is; and since it's distributed across several AMs there is no single point of control like the JT in MR1. Thus, I think there isn't enough value in porting it as-is, conceptually (not code-wise). 

In many ways this is similar to MAPREDUCE-3837, i.e. no straight-backport.

Having said that, I plan to make sure we pay attention to this when we get around to fixing RM Restart. This is something I definitely plan to do later this year, at which point we'll ensure there is no 'feature regression'.

Makes sense?

----

Eli's point about draining queues is a good one, I've opened MAPREDUCE-4575 and YARN-38 to track that. That feature is something we can do a straight-mapping conceptually across MR1 and YARN.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Kang Xiao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395841#comment-13395841 ] 

Kang Xiao commented on MAPREDUCE-4328:
--------------------------------------

It is useful in some condition such as NN is down. Actually we find a way to achieve the first goal by updating the fair scheduler's conf set each pool's max share to be zero. 
The second goal will protect the job from going to FAILED. But it seems so possible for a job to go to FAILED since no more task scheduled.

It may be more simple to just not invoke assignTasks() in JobTracker to implement the first goal. And it will not burden the scheduler implementation since 'safemode' is a small probability event.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291594#comment-13291594 ] 

Arun C Murthy commented on MAPREDUCE-4328:
------------------------------------------

I'm thinking in the quiesced mode the JT:
# Doesn't schedule anymore tasks.
# Doesn't mark *any* task as FAILED (every task is KILLED).
# Doesn't accept new job submissions.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439683#comment-13439683 ] 

Hadoop QA commented on MAPREDUCE-4328:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541997/MAPREDUCE-4328.patch
  against trunk revision .

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2757//console

This message is automatically generated.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439808#comment-13439808 ] 

Eli Collins commented on MAPREDUCE-4328:
----------------------------------------

Thanks Arun, and thanks for working on this.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403261#comment-13403261 ] 

Tom White commented on MAPREDUCE-4328:
--------------------------------------

> 3. Doesn't accept new job submissions.

To be clear - the client would get a failure, right? The current patch doesn't do that yet as far as I can see.

A few other pieces of feedback on the patch:

* The -refreshNodes option in MRAdmin was deleted from the usage message.
* Rather than putting markup in the JobTracker (in getSafeModeText()), do the formatting in the JSP or a utility class like JSPUtil (which already exists).
* Change JobTracker's getSafeMode() method to isInSafeMode(), to mirror NameNode.
* MRAdmin introduced a couple of unneeded imports: DistributedFileSystem, org.mortbay.log.Log

                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this after fixing the copy-paste error in the exception message. Thanks for the review Vinod!
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442291#comment-13442291 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4328:
----------------------------------------------------

Had a brief review of the patch. +1 for commit, but only with a minor fix:
bq. +      throw new AccessControlException(user + 
bq. +                                       " is not authorized to refresh nodes.");
Should be get/set safemode.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment: MAPREDUCE-4328.patch

Here is a preliminary patch - I figured it's simpler to call it 'safemode' for JT ala NN.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment: MAPREDUCE-4328.patch

Uh! Wrong patch, fixed now. :)
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439675#comment-13439675 ] 

Arun C Murthy commented on MAPREDUCE-4328:
------------------------------------------

All tests pass, ready to go.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439751#comment-13439751 ] 

Eli Collins commented on MAPREDUCE-4328:
----------------------------------------

Hey Arun,

Per ATM's above comment shouldn't we do the analogous feature for trunk first?  Seems like this would be YARN safemode since the AM really isn't equivalent to the JT for this feature.

Also, please motivate this feature by outlining the primary use cases.  I don't think you need to write a design doc but a basic paragraph or two would be good. From my experience admins would like to quiesce the JT so they can prevent new jobs from being launched while draining the queue of current jobs to facilitate a cluster upgrade. 

Thanks,
Eli 
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment: MAPREDUCE-4328.patch

Forgot to grant license, fixed.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment: MAPREDUCE-4328.patch

I finally got around to wrapping this up.

The difference b/w the original and the final is that I've added an optional thread to monitor the NN and put the JT automatically in safemode, bug-fixes and tests.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-4328:
---------------------------------

    Attachment: TestJobTrackerQuiescence.java

I wrote a unit test for this (attached), which might be useful.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Bikas Saha (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291995#comment-13291995 ] 

Bikas Saha commented on MAPREDUCE-4328:
---------------------------------------

But how would you programmatically know that the NameNode is not operational?
Wouldn't it help to get that information directly via an API? Do you know if one exist? 
Let me open jira to add one if it does not.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439865#comment-13439865 ] 

Aaron T. Myers commented on MAPREDUCE-4328:
-------------------------------------------

Thanks a lot for the explanation, Arun. Makes sense.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295192#comment-13295192 ] 

Aaron T. Myers commented on MAPREDUCE-4328:
-------------------------------------------

Seems like we should also implement an analogous feature in trunk/2.0, so as not to have a feature regression from branch-1.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-4328.patch
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Fix Version/s: 1.2.0
           Status: Patch Available  (was: Open)
    
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment:     (was: MAPREDUCE-4328.patch)
    
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4328:
-------------------------------------

    Attachment:     (was: MAPREDUCE-4328.patch)
    
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465148#comment-13465148 ] 

Matt Foley commented on MAPREDUCE-4328:
---------------------------------------

Accepted.
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.1.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4328) Add the option to quiesce the JobTracker

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462876#comment-13462876 ] 

Arun C Murthy commented on MAPREDUCE-4328:
------------------------------------------

Matt - if you don't mind, I'd like to merge this into branch-1.1 since it's been well baked-in. Thoughts?
                
> Add the option to quiesce the JobTracker
> ----------------------------------------
>
>                 Key: MAPREDUCE-4328
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4328
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.0.3
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 1.2.0
>
>         Attachments: MAPREDUCE-4328.patch, MAPREDUCE-4328.patch, TestJobTrackerQuiescence.java
>
>
> In several failure scenarios it would be very handy to have an option to quiesce the JobTracker.
> Recently, we saw a case where the NameNode had to be rebooted at a customer due to a random hardware failure - in such a case it would have been nice to not lose jobs by quiescing the JobTracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira