You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Eli Collins (Created) (JIRA)" <ji...@apache.org> on 2012/03/26 00:18:28 UTC

[jira] [Created] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Add option to enable DN and TT rolling upgrades in branch-1
-----------------------------------------------------------

                 Key: HADOOP-8209
                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
             Project: Hadoop Common
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Eli Collins
            Assignee: Eli Collins


In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.

This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.

However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.

In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8209:
--------------------------------

    Attachment: hadoop-8209.txt

Thanks for the review ATM. Updated patch, and re-tested.

#1 Yea, was following the existing method but better to use VersionInfo directly, done.
#2-4 Done
#5 Because JT#getVersion exists (for MXBean#getVersion), in the updated patch I've addressed this via new NN/JT methods getBuildVersion which return the version, I renamed VersionInfo#getBuildVersion to getFullVersion to clear up the distinction between the build's version and what was called the "build version".
#6 Fixed, took the same assert from testRelaxedVersionCheck and changed the polarity

                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252085#comment-13252085 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Thanks ATM. Will fix the spelling misstake, running the full suite now.
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8209:
--------------------------------

    Attachment: hadoop-8209.txt

Patch attached. Adds new config option hadoop.relaxed.worker.version.check to relax the version check to just the version number. Aside from the new DN and TT tests that cover the current/default behavior and the new behavior, I tested on a cluster and verified that (1) DNs/TTs with different revisions can not join by default, and (2) using the new flag they can (and the new log message for this case is appropriate).
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8209:
--------------------------------

    Attachment: hadoop-8209.txt

Patch attached. Minor change from the last one.
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Sanjay Radia (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252125#comment-13252125 ] 

Sanjay Radia commented on HADOOP-8209:
--------------------------------------

* I thought we were focusing on rolling upgrades for Hadoop 2 not Hadoop 1 given that wire compatibility is only in Hadoop 2. 
* The jira title should be "Add option to relax build-version check for branch-1"
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251311#comment-13251311 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Here are test-patch results. This doesn't introduce new findbugs, a null patch has 8 as well (HADOOP-7847).

{noformat}
     [exec] 
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.
     [exec] 
{noformat}
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252054#comment-13252054 ] 

Aaron T. Myers commented on HADOOP-8209:
----------------------------------------

I reviewed the delta, and it largely looks good. One tiny nit: looks like you variously spelled the word "disallow" either as "dissallow" or "dissalow".

Patch looks good otherwise - +1. Please do also run the branch-1 test suite on the latest patch before committing.
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-8209:
--------------------------------

    Summary: Add option to relax build-version check for branch-1  (was: Add option to enable DN and TT rolling upgrades in branch-1)

Sanjay,
See [this discussion|https://issues.apache.org/jira/browse/HDFS-2983?focusedCommentId=13232984&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13232984] on HDFS-2983. For v1 this only enables rolling upgrade when there's an *exact version match* (eg v1.0.2), this is still very useful though as it allows people to perform rolling upgrades for a security patch or an EBF that doesn't affect compatibility (most don't).
Updated the jira tile.

                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252734#comment-13252734 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Forgot to mention, leaving VersionInfo as is means we don't need to do anything for trunk.
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252230#comment-13252230 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Ran all the tests. There are 5 MR failures on branch-1, confirmed they all fail on a clean tree and filed MAPREDUCE-4142 for them.
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Andrew Purtell (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238562#comment-13238562 ] 

Andrew Purtell commented on HADOOP-8209:
----------------------------------------

+1 

We use this practice.

This makes it possible to do a rolling restart of DataNodes without taking down service by bouncing the NameNode. This is most useful when the change scope is restricted to the DN. If HA is backported to branch-1 we could handle most NN changes similarly: Upgrade the NNs one at a time with manual failover for no downtime. One issue remaining is that modification of a NN<->DN interface method requires a kludgy migration over three updates.

It is also possible to do this with TaskTrackers, but this will fail currently running tasks on the TT. Even so we can still stage in a TT bugfix release, just more slowly. Bouncing the JobTracker remains a big deal, but the maintenance window for that becomes very short if everything else has been rolled out ahead of time. With some "HA JT" option for branch-1 (Corona?) this might also have no downtime.
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Tom White (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252808#comment-13252808 ] 

Tom White commented on HADOOP-8209:
-----------------------------------

Agree that not changing VersionInfo is better. +1 from me if Jenkins comes back OK.
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Tom White (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252691#comment-13252691 ] 

Tom White commented on HADOOP-8209:
-----------------------------------

> I renamed VersionInfo#getBuildVersion to getFullVersion to clear up the distinction between the build's version and what was called the "build version".

Rather than renaming, maybe add getFullVersion() and deprecate getBuildVersion()? Also, is this change needed on trunk as well?
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins resolved HADOOP-8209.
---------------------------------

          Resolution: Fixed
       Fix Version/s: 1.1.0
    Target Version/s:   (was: 1.1.0)
        Hadoop Flags: Reviewed

Thanks for the reviews ATM and Tom, I've committed this to branch-1. 
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>             Fix For: 1.1.0
>
>         Attachments: hadoop-8209.txt, hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252727#comment-13252727 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Thanks for the feedback Tom. Looked at this again and I think it's better to leave VersionInfo as is (not rename getBuildVersion to getFullVersion) and just make InterTrackerProtocol match, eg InterTrackerProtocol#getBuildVersion should return VersionInfo#getBuildVersion and add InterTrackerProtocol#getVIVersion that returns VersionInfo#getVersion (there's already a getVersion method for MXBean). Sound good?
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to enable DN and TT rolling upgrades in branch-1

Posted by "Aaron T. Myers (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251843#comment-13251843 ] 

Aaron T. Myers commented on HADOOP-8209:
----------------------------------------

Patch looks pretty good to me, Eli. Just a few small comments:

# Not obvious to me why we have these static version methods in the Storage class, which themselves just delegate to static methods of the VersionInfo class.
# Recommend adding additional detail to the AssertionErrors, including the revisions and versions that didn't match.
# Recommend adding an explanation to the DN log message about why the communication is being allowed, e.g.: "... because versions match exactly ('" + version + "') and hadoop.relaxed.worker.version.check is enabled." Ditto for TT.
# Similarly the log message explaining why communication isn't being allowed might mention whether the check failed because of strict revision checking, or relaxed version checking.
# Why call the new method "getInfoVersion" in JobTracker? getVersion, as was done in Storage, seems to make more sense to me.
# In TestTaskTrackerVersionCheck#testDefaultVersionCheck, I don't think you actually test that different revisions are still disallowed by default, since you change both the revision and version simultaneously in the test.
                
> Add option to enable DN and TT rolling upgrades in branch-1
> -----------------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8209) Add option to relax build-version check for branch-1

Posted by "Eli Collins (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252850#comment-13252850 ] 

Eli Collins commented on HADOOP-8209:
-------------------------------------

Thanks Tom, re-ran test-patch and the tests.
                
> Add option to relax build-version check for branch-1
> ----------------------------------------------------
>
>                 Key: HADOOP-8209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8209
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-8209.txt, hadoop-8209.txt, hadoop-8209.txt
>
>
> In 1.x DNs currently refuse to connect to NNs if their build *revision* (ie svn revision) do not match. TTs refuse to connect to JTs if their build *version* (version, revision, user, and source checksum) do not match.
> This prevents rolling upgrades, which is intentional, see the discussion in HADOOP-5203. The primary motivation in that jira was (1) it's difficult to guarantee every build on a large cluster got deployed correctly, builds don't get rolled back to old versions by accident etc, and (2) mixed versions can lead to execution problems that are hard to debug.
> However there are also cases when users know they two builds are compatible, eg when deploying a new build which contains the same contents as the previous one, plus a critical security patch that does not affect compatibility. Currently deploying a 1 line patch requires taking down the entire cluster (or trying to work around the issue by lying about the build revision or checksum, yuck). These users would like to be able to perform a rolling upgrade.
> In order to support this, let's add an option that is off by default, but, when enabled, makes the DN and TT version check just check for an exact version match (eg "1.0.2") but ignore the build revision (DN) and the source checksum (TT). Two builds still need to match the major, minor, and point numbers, but nothing else.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira