You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Arun Suresh <as...@apache.org> on 2017/11/03 22:50:18 UTC

[VOTE] Release Apache Hadoop 2.9.0 (RC0)

Hi folks,

     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
will be the latest stable/production release for Apache Hadoop - it
includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
fixes new fixed issues since 2.8.2 .

      More information about the 2.9.0 release plan can be found here:
*https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
<https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9>*

      New RC is available at:
http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/

      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
6697f0c18b12f1bdb99cbdf81394091f4fef1f0a

      The maven artifacts are available via repository.apache.org at:
*https://repository.apache.org/content/repositories/orgapachehadoop-1065/
<https://repository.apache.org/content/repositories/orgapachehadoop-1065/>*

      Please try the release and vote; the vote will run for the usual 5
days, ending on 11/10/2017 4pm PST time.

Thanks,

Arun/Subru

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkaranasos@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >> I've
> >>>>> updated the YARN-7453 <https://issues.apache.org/
> >> jira/browse/YARN-7453
> >>>>
> >>>>> JIRA
> >>>>> with details of testing.
> >>>>>
> >>>>> Cheers
> >>>>> -Arun/Subru
> >>>>>
> >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> >>>>> rohithsharmaks@apache.org
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> >> this
> >>>>>> issue.
> >>>>>>
> >>>>>> - Rohith Sharma K S
> >>>>>>
> >>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >>>>>>
> >>>>>>> Hi Subru and Arun.
> >>>>>>>
> >>>>>>> Thanks for driving 2.9 release. Great work!
> >>>>>>>
> >>>>>>> I installed cluster built from source.
> >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
> >>>>>>> - Accessed new UI and it also seems fine.
> >>>>>>>
> >>>>>>> However I am also getting same issue as Rohith reported.
> >>>>>>> - Started an HA cluster
> >>>>>>> - Pushed RM to standby
> >>>>>>> - Pushed back RM to active then seeing an exception.
> >>>>>>>
> >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>> transition
> >>>> to
> >>>>>>> Active
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorServic
> >>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
> >>> orService.java:146)
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894
> >>>>>>>    )
> >>>>>>>
> >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>        at
> >>>>>>> org.apache.zookeeper.KeeperException.create(
> >>> KeeperException.java:113)
> >>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>> ZooKeeper.java:
> >>>>>>> 949)
> >>>>>>>
> >>>>>>> Will check and post more details,
> >>>>>>>
> >>>>>>> - Sunil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >>>>>>> rohithsharmaks@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Subru/Arun for the great work!
> >>>>>>>>
> >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
> >>>>> cluster
> >>>>>>>> along with new YARN UI and ATSv2.
> >>>>>>>>
> >>>>>>>> I am facing basic RM HA switch issue after first time successful
> >>>>> start.
> >>>>>>>> *Can
> >>>>>>>> anyone else is facing this issue?*
> >>>>>>>>
> >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> >>>> switch
> >>>>> to
> >>>>>>>> active successfully. Exception trace I see from the log is
> >>>>>>>>
> >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> >>>>> ActiveStandbyElector:
> >>>>>>>> Exception handling the winning of election
> >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>>> transition
> >>>>> to
> >>>>>>>> Active
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:146)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >>>>>>> veStandbyElector.java:473)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >>>>>>> ClientCnxn.java:599)
> >>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
> >>> ClientCnxn.
> >>>>>>> java:498)
> >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> >>> when
> >>>>>>>> transitioning to Active mode
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:325)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:144)
> >>>>>>>>    ... 4 more
> >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
> >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>> KeeperErrorCode =
> >>>>>>>> NoAuth
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
> >>>>>>> iceStateException.java:105)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:205)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.startActiveServices(ResourceManager.java:1131)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1171)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1167)
> >>>>>>>>    at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >>>>>>> upInformation.java:1886)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.transitionToActive(ResourceManager.java:1167)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:320)
> >>>>>>>>    ... 5 more
> >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
> >> NoAuthException:
> >>>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>>    at
> >>>>>>>> org.apache.zookeeper.KeeperException.create(
> >>>> KeeperException.java:113)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>>> ZooKeeper.java:949)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >>>>>>> peration(CuratorTransactionImpl.java:159)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >>>>>>> ess$200(CuratorTransactionImpl.java:44)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:129)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:125)
> >>>>>>>>    at org.apache.curator.RetryLoop.
> >> callWithRetry(RetryLoop.java:
> >>>> 107)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >>>>>>> mit(CuratorTransactionImpl.java:122)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >>>>>>> ion.commit(ZKCuratorManager.java:403)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >>>>>>> ZKCuratorManager.java:372)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:194)
> >>>>>>>>    ... 13 more
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Rohith Sharma K S
> >>>>>>>>
> >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
> >>> 2.9
> >>>>>>> line
> >>>>>>>> and
> >>>>>>>>> will be the latest stable/production release for Apache
> >> Hadoop -
> >>>> it
> >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
> >>> 787
> >>>>> Bug
> >>>>>>>>> fixes new fixed issues since 2.8.2 .
> >>>>>>>>>
> >>>>>>>>>      More information about the 2.9.0 release plan can be
> >> found
> >>>>> here:
> >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9>*
> >>>>>>>>>
> >>>>>>>>>      New RC is available at:
> >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>>>>>>>>
> >>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
> >>>> commit
> >>>>>>> id
> >>>>>>>> is:
> >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>>>>>>>>
> >>>>>>>>>      The maven artifacts are available via
> >>> repository.apache.org
> >>>>> at:
> >>>>>>>>> *
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>> <
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>>>>      Please try the release and vote; the vote will run for
> >> the
> >>>>>>> usual 5
> >>>>>>>>> days, ending on 11/10/2017 4pm PST time.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Arun/Subru
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkaranasos@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >> I've
> >>>>> updated the YARN-7453 <https://issues.apache.org/
> >> jira/browse/YARN-7453
> >>>>
> >>>>> JIRA
> >>>>> with details of testing.
> >>>>>
> >>>>> Cheers
> >>>>> -Arun/Subru
> >>>>>
> >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> >>>>> rohithsharmaks@apache.org
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> >> this
> >>>>>> issue.
> >>>>>>
> >>>>>> - Rohith Sharma K S
> >>>>>>
> >>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >>>>>>
> >>>>>>> Hi Subru and Arun.
> >>>>>>>
> >>>>>>> Thanks for driving 2.9 release. Great work!
> >>>>>>>
> >>>>>>> I installed cluster built from source.
> >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
> >>>>>>> - Accessed new UI and it also seems fine.
> >>>>>>>
> >>>>>>> However I am also getting same issue as Rohith reported.
> >>>>>>> - Started an HA cluster
> >>>>>>> - Pushed RM to standby
> >>>>>>> - Pushed back RM to active then seeing an exception.
> >>>>>>>
> >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>> transition
> >>>> to
> >>>>>>> Active
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorServic
> >>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
> >>> orService.java:146)
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894
> >>>>>>>    )
> >>>>>>>
> >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>        at
> >>>>>>> org.apache.zookeeper.KeeperException.create(
> >>> KeeperException.java:113)
> >>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>> ZooKeeper.java:
> >>>>>>> 949)
> >>>>>>>
> >>>>>>> Will check and post more details,
> >>>>>>>
> >>>>>>> - Sunil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >>>>>>> rohithsharmaks@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Subru/Arun for the great work!
> >>>>>>>>
> >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
> >>>>> cluster
> >>>>>>>> along with new YARN UI and ATSv2.
> >>>>>>>>
> >>>>>>>> I am facing basic RM HA switch issue after first time successful
> >>>>> start.
> >>>>>>>> *Can
> >>>>>>>> anyone else is facing this issue?*
> >>>>>>>>
> >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> >>>> switch
> >>>>> to
> >>>>>>>> active successfully. Exception trace I see from the log is
> >>>>>>>>
> >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> >>>>> ActiveStandbyElector:
> >>>>>>>> Exception handling the winning of election
> >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>>> transition
> >>>>> to
> >>>>>>>> Active
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:146)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >>>>>>> veStandbyElector.java:473)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >>>>>>> ClientCnxn.java:599)
> >>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
> >>> ClientCnxn.
> >>>>>>> java:498)
> >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> >>> when
> >>>>>>>> transitioning to Active mode
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:325)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:144)
> >>>>>>>>    ... 4 more
> >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
> >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>> KeeperErrorCode =
> >>>>>>>> NoAuth
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
> >>>>>>> iceStateException.java:105)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:205)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.startActiveServices(ResourceManager.java:1131)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1171)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1167)
> >>>>>>>>    at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >>>>>>> upInformation.java:1886)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.transitionToActive(ResourceManager.java:1167)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:320)
> >>>>>>>>    ... 5 more
> >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
> >> NoAuthException:
> >>>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>>    at
> >>>>>>>> org.apache.zookeeper.KeeperException.create(
> >>>> KeeperException.java:113)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>>> ZooKeeper.java:949)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >>>>>>> peration(CuratorTransactionImpl.java:159)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >>>>>>> ess$200(CuratorTransactionImpl.java:44)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:129)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:125)
> >>>>>>>>    at org.apache.curator.RetryLoop.
> >> callWithRetry(RetryLoop.java:
> >>>> 107)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >>>>>>> mit(CuratorTransactionImpl.java:122)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >>>>>>> ion.commit(ZKCuratorManager.java:403)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >>>>>>> ZKCuratorManager.java:372)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:194)
> >>>>>>>>    ... 13 more
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Rohith Sharma K S
> >>>>>>>>
> >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
> >>> 2.9
> >>>>>>> line
> >>>>>>>> and
> >>>>>>>>> will be the latest stable/production release for Apache
> >> Hadoop -
> >>>> it
> >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
> >>> 787
> >>>>> Bug
> >>>>>>>>> fixes new fixed issues since 2.8.2 .
> >>>>>>>>>
> >>>>>>>>>      More information about the 2.9.0 release plan can be
> >> found
> >>>>> here:
> >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9>*
> >>>>>>>>>
> >>>>>>>>>      New RC is available at:
> >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>>>>>>>>
> >>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
> >>>> commit
> >>>>>>> id
> >>>>>>>> is:
> >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>>>>>>>>
> >>>>>>>>>      The maven artifacts are available via
> >>> repository.apache.org
> >>>>> at:
> >>>>>>>>> *
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>> <
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>>>>      Please try the release and vote; the vote will run for
> >> the
> >>>>>>> usual 5
> >>>>>>>>> days, ending on 11/10/2017 4pm PST time.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Arun/Subru
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Ajay Kumar <aj...@hortonworks.com>.
+1 (non binding)
Thanks for working on this, Arun!!

Tested below cases after building from source in mac , java 1.8.:
1) setup a small cluster
2) run hdfs commands
3) ran wordcount, pi and TestDFSIO tests.

Thanks,
Ajay

On 12/10/17, 7:47 PM, "Vinod Kumar Vavilapalli" <vi...@apache.org> wrote:

    Missed this response on the old thread, but closing the loop here..
    
    The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.
    
    I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.
    
    If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..
    
    +Vinod
    
    > On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
    > 
    > The labor required for these release formalisms is exceeding their
    > value. Our minor releases have more bugs than our patch releases (we
    > hope), but every consumer should understand how software versioning
    > works. Every device I own has bugs on major OS updates. That doesn't
    > imply that every minor release is strictly less stable than a patch
    > release, and users need to be warned off it.
    > 
    > In contrast, we should warn users about features that compromise
    > invariants like security or durability, either by design or due to
    > their early stage of development. We can't reasonably expect them to
    > understand those tradeoffs, since they depend on internal details of
    > Hadoop.
    > 
    > On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
    > <vinodkv@apache.org <ma...@apache.org>> wrote:
    >> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
    > 
    > Every project takes these case-by-case. Most of the time we'll
    > accommodate the old semantics- and we try to be explicit where we
    > promise compatibility- but this isn't a logic problem, it's a
    > practical one. If it's an easy fix to an obscure API, we probably
    > won't even hear about it.
    > 
    >> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
    > 
    > It's uncomfortable to have four active release branches, with 3.1
    > coming in early 2018. We all benefit from the shared deployment
    > experiences that harden these releases, and fragmentation creates
    > incentives to compete for that attention. Rather than tacitly
    > scuffling over waning interest in the 2.x series, I'd endorse your
    > other thread encouraging consolidation around 3.x.
    > 
    > To that end, there is no policy or precedent that requires that new
    > minor releases be labeled as "alpha". If there is cause to believe
    > that 2.9.0 is not ready to release in the stable line, then we
    > shouldn't release it. -C
    > 
    >>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
    >>> 
    >>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
    >>> issue was difficult to track down as it only happens when you use IP for ZK
    >>> (works fine with host names) and moreover if ZK and RM are co-located on
    >>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
    >>> 
    >>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
    >>> replicate with RC1 is much lower.
    >>> 
    >>> -Subru/Arun.
    >>> 
    >>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
    >>>> wrote:
    >>> 
    >>>> +1 from me too.
    >>>> 
    >>>> Did the following:
    >>>> 1) set up a 9-node cluster;
    >>>> 2) ran some Gridmix jobs;
    >>>> 3) ran (2) after enabling opportunistic containers (used a mix of
    >>>> guaranteed and opportunistic containers for each job);
    >>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
    >>>> containers.
    >>>> 
    >>>> All the above worked with no issues.
    >>>> 
    >>>> Thanks for all the effort guys!
    >>>> 
    >>>> Konstantinos
    >>>> 
    >>>> 
    >>>> 
    >>>> Konstantinos
    >>>> 
    >>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
    >>>> wrote:
    >>>> 
    >>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
    >>>>> 
    >>>>> - Verified all hashes and checksums
    >>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
    >>>>> - Deployed a pseudo cluster
    >>>>> - Ran some example jobs
    >>>>> 
    >>>>> Thanks,
    >>>>> 
    >>>>> Eric
    >>>>> 
    >>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
    >>>>> 
    >>>>>> Sunil / Rohith,
    >>>>>> 
    >>>>>> Could you check if your configs are same as Jonathan posted configs?
    >>>>>> https://issues.apache.org/jira/browse/YARN-7453?
    >>>>> focusedCommentId=16242693&
    >>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
    >>>>>> comment-tabpanel#comment-16242693
    >>>>>> 
    >>>>>> And could you try if using Jonathan's configs can still reproduce the
    >>>>>> issue?
    >>>>>> 
    >>>>>> Thanks,
    >>>>>> Wangda
    >>>>>> 
    >>>>>> 
    >>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
    >>>> wrote:
    >>>>>> 
    >>>>>>> Thanks for testing Rohith and Sunil
    >>>>>>> 
    >>>>>>> Can you please confirm if it is not a config issue at your end ?
    >>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
    >>>>> cluster
    >>>>>>> (both automatic and manual) and we are not able to reproduce this.
    >>>> I've
    >>>>>>> updated the YARN-7453 <https://issues.apache.org/
    >>>> jira/browse/YARN-7453
    >>>>>> 
    >>>>>>> JIRA
    >>>>>>> with details of testing.
    >>>>>>> 
    >>>>>>> Cheers
    >>>>>>> -Arun/Subru
    >>>>>>> 
    >>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
    >>>>>>> rohithsharmaks@apache.org
    >>>>>>>> wrote:
    >>>>>>> 
    >>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
    >>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
    >>>> this
    >>>>>>>> issue.
    >>>>>>>> 
    >>>>>>>> - Rohith Sharma K S
    >>>>>>>> 
    >>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
    >>>>>>>> 
    >>>>>>>>> Hi Subru and Arun.
    >>>>>>>>> 
    >>>>>>>>> Thanks for driving 2.9 release. Great work!
    >>>>>>>>> 
    >>>>>>>>> I installed cluster built from source.
    >>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
    >>>>>>>>> - Accessed new UI and it also seems fine.
    >>>>>>>>> 
    >>>>>>>>> However I am also getting same issue as Rohith reported.
    >>>>>>>>> - Started an HA cluster
    >>>>>>>>> - Pushed RM to standby
    >>>>>>>>> - Pushed back RM to active then seeing an exception.
    >>>>>>>>> 
    >>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
    >>>>> transition
    >>>>>> to
    >>>>>>>>> Active
    >>>>>>>>>       at
    >>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorServic
    >>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
    >>>>> orService.java:146)
    >>>>>>>>>       at
    >>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
    >>>>>>>>> eStandbyElector.java:894
    >>>>>>>>>   )
    >>>>>>>>> 
    >>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
    >>>>>>>>> KeeperErrorCode = NoAuth
    >>>>>>>>>       at
    >>>>>>>>> org.apache.zookeeper.KeeperException.create(
    >>>>> KeeperException.java:113)
    >>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
    >>>>>> ZooKeeper.java:
    >>>>>>>>> 949)
    >>>>>>>>> 
    >>>>>>>>> Will check and post more details,
    >>>>>>>>> 
    >>>>>>>>> - Sunil
    >>>>>>>>> 
    >>>>>>>>> 
    >>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
    >>>>>>>>> rohithsharmaks@apache.org>
    >>>>>>>>> wrote:
    >>>>>>>>> 
    >>>>>>>>>> Thanks Subru/Arun for the great work!
    >>>>>>>>>> 
    >>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
    >>>>>>> cluster
    >>>>>>>>>> along with new YARN UI and ATSv2.
    >>>>>>>>>> 
    >>>>>>>>>> I am facing basic RM HA switch issue after first time successful
    >>>>>>> start.
    >>>>>>>>>> *Can
    >>>>>>>>>> anyone else is facing this issue?*
    >>>>>>>>>> 
    >>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
    >>>>>> switch
    >>>>>>> to
    >>>>>>>>>> active successfully. Exception trace I see from the log is
    >>>>>>>>>> 
    >>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
    >>>>>>> ActiveStandbyElector:
    >>>>>>>>>> Exception handling the winning of election
    >>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
    >>>>>> transition
    >>>>>>> to
    >>>>>>>>>> Active
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
    >>>>>>>>> torBasedElectorService.java:146)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
    >>>>>>>>> eStandbyElector.java:894)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
    >>>>>>>>> veStandbyElector.java:473)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
    >>>>>>>>> ClientCnxn.java:599)
    >>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
    >>>>> ClientCnxn.
    >>>>>>>>> java:498)
    >>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
    >>>>> when
    >>>>>>>>>> transitioning to Active mode
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
    >>>>>>>>> ransitionToActive(AdminService.java:325)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
    >>>>>>>>> torBasedElectorService.java:144)
    >>>>>>>>>>   ... 4 more
    >>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
    >>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
    >>>>>>> KeeperErrorCode =
    >>>>>>>>>> NoAuth
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
    >>>>>>>>> iceStateException.java:105)
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
    >>>>>>>>> ice.java:205)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$1.run(ResourceManager.java:1171)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$1.run(ResourceManager.java:1167)
    >>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
    >>>> Method)
    >>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
    >>>>>>>>> upInformation.java:1886)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
    >>>>>>>>> ransitionToActive(AdminService.java:320)
    >>>>>>>>>>   ... 5 more
    >>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
    >>>> NoAuthException:
    >>>>>>>>>> KeeperErrorCode = NoAuth
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.zookeeper.KeeperException.create(
    >>>>>> KeeperException.java:113)
    >>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
    >>>>>>> ZooKeeper.java:949)
    >>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
    >>>>>>>>> peration(CuratorTransactionImpl.java:159)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
    >>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
    >>>>>>>>> all(CuratorTransactionImpl.java:129)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
    >>>>>>>>> all(CuratorTransactionImpl.java:125)
    >>>>>>>>>>   at org.apache.curator.RetryLoop.
    >>>> callWithRetry(RetryLoop.java:
    >>>>>> 107)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
    >>>>>>>>> mit(CuratorTransactionImpl.java:122)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
    >>>>>>>>> ion.commit(ZKCuratorManager.java:403)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
    >>>>>>>>> ZKCuratorManager.java:372)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
    >>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
    >>>>>>>>> ice.java:194)
    >>>>>>>>>>   ... 13 more
    >>>>>>>>>> 
    >>>>>>>>>> Thanks & Regards
    >>>>>>>>>> Rohith Sharma K S
    >>>>>>>>>> 
    >>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
    >>>>>> wrote:
    >>>>>>>>>> 
    >>>>>>>>>>> Hi folks,
    >>>>>>>>>>> 
    >>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
    >>>>> 2.9
    >>>>>>>>> line
    >>>>>>>>>> and
    >>>>>>>>>>> will be the latest stable/production release for Apache
    >>>> Hadoop -
    >>>>>> it
    >>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
    >>>>> 787
    >>>>>>> Bug
    >>>>>>>>>>> fixes new fixed issues since 2.8.2 .
    >>>>>>>>>>> 
    >>>>>>>>>>>     More information about the 2.9.0 release plan can be
    >>>> found
    >>>>>>> here:
    >>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
    >>>>>>>>>>> Roadmap#Roadmap-Version2.9
    >>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
    >>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
    >>>>>>>>>>> 
    >>>>>>>>>>>     New RC is available at:
    >>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
    >>>>>>>>>>> 
    >>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
    >>>>>> commit
    >>>>>>>>> id
    >>>>>>>>>> is:
    >>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
    >>>>>>>>>>> 
    >>>>>>>>>>>     The maven artifacts are available via
    >>>>> repository.apache.org
    >>>>>>> at:
    >>>>>>>>>>> *
    >>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
    >>>>>>>>> hadoop-1065/
    >>>>>>>>>>> <
    >>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
    >>>>>>>>> hadoop-1065/
    >>>>>>>>>>>> *
    >>>>>>>>>>> 
    >>>>>>>>>>>     Please try the release and vote; the vote will run for
    >>>> the
    >>>>>>>>> usual 5
    >>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
    >>>>>>>>>>> 
    >>>>>>>>>>> Thanks,
    >>>>>>>>>>> 
    >>>>>>>>>>> Arun/Subru
    >>>>>>>>>>> 
    >>>>>>>>>> 
    >>>>>>>>> 
    >>>>>>>> 
    >>>>>>>> 
    >>>>>>> 
    >>>>>> 
    >>>>> 
    >>>> 
    >> 
    >> 
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
    >> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
    >> 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
    > For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>
    


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Ajay Kumar <aj...@hortonworks.com>.
+1 (non binding)
Thanks for working on this, Arun!!

Tested below cases after building from source in mac , java 1.8.:
1) setup a small cluster
2) run hdfs commands
3) ran wordcount, pi and TestDFSIO tests.

Thanks,
Ajay

On 12/10/17, 7:47 PM, "Vinod Kumar Vavilapalli" <vi...@apache.org> wrote:

    Missed this response on the old thread, but closing the loop here..
    
    The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.
    
    I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.
    
    If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..
    
    +Vinod
    
    > On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
    > 
    > The labor required for these release formalisms is exceeding their
    > value. Our minor releases have more bugs than our patch releases (we
    > hope), but every consumer should understand how software versioning
    > works. Every device I own has bugs on major OS updates. That doesn't
    > imply that every minor release is strictly less stable than a patch
    > release, and users need to be warned off it.
    > 
    > In contrast, we should warn users about features that compromise
    > invariants like security or durability, either by design or due to
    > their early stage of development. We can't reasonably expect them to
    > understand those tradeoffs, since they depend on internal details of
    > Hadoop.
    > 
    > On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
    > <vinodkv@apache.org <ma...@apache.org>> wrote:
    >> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
    > 
    > Every project takes these case-by-case. Most of the time we'll
    > accommodate the old semantics- and we try to be explicit where we
    > promise compatibility- but this isn't a logic problem, it's a
    > practical one. If it's an easy fix to an obscure API, we probably
    > won't even hear about it.
    > 
    >> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
    > 
    > It's uncomfortable to have four active release branches, with 3.1
    > coming in early 2018. We all benefit from the shared deployment
    > experiences that harden these releases, and fragmentation creates
    > incentives to compete for that attention. Rather than tacitly
    > scuffling over waning interest in the 2.x series, I'd endorse your
    > other thread encouraging consolidation around 3.x.
    > 
    > To that end, there is no policy or precedent that requires that new
    > minor releases be labeled as "alpha". If there is cause to believe
    > that 2.9.0 is not ready to release in the stable line, then we
    > shouldn't release it. -C
    > 
    >>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
    >>> 
    >>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
    >>> issue was difficult to track down as it only happens when you use IP for ZK
    >>> (works fine with host names) and moreover if ZK and RM are co-located on
    >>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
    >>> 
    >>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
    >>> replicate with RC1 is much lower.
    >>> 
    >>> -Subru/Arun.
    >>> 
    >>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
    >>>> wrote:
    >>> 
    >>>> +1 from me too.
    >>>> 
    >>>> Did the following:
    >>>> 1) set up a 9-node cluster;
    >>>> 2) ran some Gridmix jobs;
    >>>> 3) ran (2) after enabling opportunistic containers (used a mix of
    >>>> guaranteed and opportunistic containers for each job);
    >>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
    >>>> containers.
    >>>> 
    >>>> All the above worked with no issues.
    >>>> 
    >>>> Thanks for all the effort guys!
    >>>> 
    >>>> Konstantinos
    >>>> 
    >>>> 
    >>>> 
    >>>> Konstantinos
    >>>> 
    >>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
    >>>> wrote:
    >>>> 
    >>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
    >>>>> 
    >>>>> - Verified all hashes and checksums
    >>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
    >>>>> - Deployed a pseudo cluster
    >>>>> - Ran some example jobs
    >>>>> 
    >>>>> Thanks,
    >>>>> 
    >>>>> Eric
    >>>>> 
    >>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
    >>>>> 
    >>>>>> Sunil / Rohith,
    >>>>>> 
    >>>>>> Could you check if your configs are same as Jonathan posted configs?
    >>>>>> https://issues.apache.org/jira/browse/YARN-7453?
    >>>>> focusedCommentId=16242693&
    >>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
    >>>>>> comment-tabpanel#comment-16242693
    >>>>>> 
    >>>>>> And could you try if using Jonathan's configs can still reproduce the
    >>>>>> issue?
    >>>>>> 
    >>>>>> Thanks,
    >>>>>> Wangda
    >>>>>> 
    >>>>>> 
    >>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
    >>>> wrote:
    >>>>>> 
    >>>>>>> Thanks for testing Rohith and Sunil
    >>>>>>> 
    >>>>>>> Can you please confirm if it is not a config issue at your end ?
    >>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
    >>>>> cluster
    >>>>>>> (both automatic and manual) and we are not able to reproduce this.
    >>>> I've
    >>>>>>> updated the YARN-7453 <https://issues.apache.org/
    >>>> jira/browse/YARN-7453
    >>>>>> 
    >>>>>>> JIRA
    >>>>>>> with details of testing.
    >>>>>>> 
    >>>>>>> Cheers
    >>>>>>> -Arun/Subru
    >>>>>>> 
    >>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
    >>>>>>> rohithsharmaks@apache.org
    >>>>>>>> wrote:
    >>>>>>> 
    >>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
    >>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
    >>>> this
    >>>>>>>> issue.
    >>>>>>>> 
    >>>>>>>> - Rohith Sharma K S
    >>>>>>>> 
    >>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
    >>>>>>>> 
    >>>>>>>>> Hi Subru and Arun.
    >>>>>>>>> 
    >>>>>>>>> Thanks for driving 2.9 release. Great work!
    >>>>>>>>> 
    >>>>>>>>> I installed cluster built from source.
    >>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
    >>>>>>>>> - Accessed new UI and it also seems fine.
    >>>>>>>>> 
    >>>>>>>>> However I am also getting same issue as Rohith reported.
    >>>>>>>>> - Started an HA cluster
    >>>>>>>>> - Pushed RM to standby
    >>>>>>>>> - Pushed back RM to active then seeing an exception.
    >>>>>>>>> 
    >>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
    >>>>> transition
    >>>>>> to
    >>>>>>>>> Active
    >>>>>>>>>       at
    >>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorServic
    >>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
    >>>>> orService.java:146)
    >>>>>>>>>       at
    >>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
    >>>>>>>>> eStandbyElector.java:894
    >>>>>>>>>   )
    >>>>>>>>> 
    >>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
    >>>>>>>>> KeeperErrorCode = NoAuth
    >>>>>>>>>       at
    >>>>>>>>> org.apache.zookeeper.KeeperException.create(
    >>>>> KeeperException.java:113)
    >>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
    >>>>>> ZooKeeper.java:
    >>>>>>>>> 949)
    >>>>>>>>> 
    >>>>>>>>> Will check and post more details,
    >>>>>>>>> 
    >>>>>>>>> - Sunil
    >>>>>>>>> 
    >>>>>>>>> 
    >>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
    >>>>>>>>> rohithsharmaks@apache.org>
    >>>>>>>>> wrote:
    >>>>>>>>> 
    >>>>>>>>>> Thanks Subru/Arun for the great work!
    >>>>>>>>>> 
    >>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
    >>>>>>> cluster
    >>>>>>>>>> along with new YARN UI and ATSv2.
    >>>>>>>>>> 
    >>>>>>>>>> I am facing basic RM HA switch issue after first time successful
    >>>>>>> start.
    >>>>>>>>>> *Can
    >>>>>>>>>> anyone else is facing this issue?*
    >>>>>>>>>> 
    >>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
    >>>>>> switch
    >>>>>>> to
    >>>>>>>>>> active successfully. Exception trace I see from the log is
    >>>>>>>>>> 
    >>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
    >>>>>>> ActiveStandbyElector:
    >>>>>>>>>> Exception handling the winning of election
    >>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
    >>>>>> transition
    >>>>>>> to
    >>>>>>>>>> Active
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
    >>>>>>>>> torBasedElectorService.java:146)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
    >>>>>>>>> eStandbyElector.java:894)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
    >>>>>>>>> veStandbyElector.java:473)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
    >>>>>>>>> ClientCnxn.java:599)
    >>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
    >>>>> ClientCnxn.
    >>>>>>>>> java:498)
    >>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
    >>>>> when
    >>>>>>>>>> transitioning to Active mode
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
    >>>>>>>>> ransitionToActive(AdminService.java:325)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
    >>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
    >>>>>>>>> torBasedElectorService.java:144)
    >>>>>>>>>>   ... 4 more
    >>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
    >>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
    >>>>>>> KeeperErrorCode =
    >>>>>>>>>> NoAuth
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
    >>>>>>>>> iceStateException.java:105)
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
    >>>>>>>>> ice.java:205)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$1.run(ResourceManager.java:1171)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$1.run(ResourceManager.java:1167)
    >>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
    >>>> Method)
    >>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
    >>>>>>>>> upInformation.java:1886)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
    >>>>>>>>> ransitionToActive(AdminService.java:320)
    >>>>>>>>>>   ... 5 more
    >>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
    >>>> NoAuthException:
    >>>>>>>>>> KeeperErrorCode = NoAuth
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.zookeeper.KeeperException.create(
    >>>>>> KeeperException.java:113)
    >>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
    >>>>>>> ZooKeeper.java:949)
    >>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
    >>>>>>>>> peration(CuratorTransactionImpl.java:159)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
    >>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
    >>>>>>>>> all(CuratorTransactionImpl.java:129)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
    >>>>>>>>> all(CuratorTransactionImpl.java:125)
    >>>>>>>>>>   at org.apache.curator.RetryLoop.
    >>>> callWithRetry(RetryLoop.java:
    >>>>>> 107)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
    >>>>>>>>> mit(CuratorTransactionImpl.java:122)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
    >>>>>>>>> ion.commit(ZKCuratorManager.java:403)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
    >>>>>>>>> ZKCuratorManager.java:372)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
    >>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    >>>>>>>>>>   at
    >>>>>>>>>> 
    >>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
    >>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
    >>>>>>>>>>   at
    >>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
    >>>>>>>>> ice.java:194)
    >>>>>>>>>>   ... 13 more
    >>>>>>>>>> 
    >>>>>>>>>> Thanks & Regards
    >>>>>>>>>> Rohith Sharma K S
    >>>>>>>>>> 
    >>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
    >>>>>> wrote:
    >>>>>>>>>> 
    >>>>>>>>>>> Hi folks,
    >>>>>>>>>>> 
    >>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
    >>>>> 2.9
    >>>>>>>>> line
    >>>>>>>>>> and
    >>>>>>>>>>> will be the latest stable/production release for Apache
    >>>> Hadoop -
    >>>>>> it
    >>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
    >>>>> 787
    >>>>>>> Bug
    >>>>>>>>>>> fixes new fixed issues since 2.8.2 .
    >>>>>>>>>>> 
    >>>>>>>>>>>     More information about the 2.9.0 release plan can be
    >>>> found
    >>>>>>> here:
    >>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
    >>>>>>>>>>> Roadmap#Roadmap-Version2.9
    >>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
    >>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
    >>>>>>>>>>> 
    >>>>>>>>>>>     New RC is available at:
    >>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
    >>>>>>>>>>> 
    >>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
    >>>>>> commit
    >>>>>>>>> id
    >>>>>>>>>> is:
    >>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
    >>>>>>>>>>> 
    >>>>>>>>>>>     The maven artifacts are available via
    >>>>> repository.apache.org
    >>>>>>> at:
    >>>>>>>>>>> *
    >>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
    >>>>>>>>> hadoop-1065/
    >>>>>>>>>>> <
    >>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
    >>>>>>>>> hadoop-1065/
    >>>>>>>>>>>> *
    >>>>>>>>>>> 
    >>>>>>>>>>>     Please try the release and vote; the vote will run for
    >>>> the
    >>>>>>>>> usual 5
    >>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
    >>>>>>>>>>> 
    >>>>>>>>>>> Thanks,
    >>>>>>>>>>> 
    >>>>>>>>>>> Arun/Subru
    >>>>>>>>>>> 
    >>>>>>>>>> 
    >>>>>>>>> 
    >>>>>>>> 
    >>>>>>>> 
    >>>>>>> 
    >>>>>> 
    >>>>> 
    >>>> 
    >> 
    >> 
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
    >> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
    >> 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
    > For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>
    


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Missed this response on the old thread, but closing the loop here..

The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.

I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.

If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..

+Vinod

> On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
> 
> The labor required for these release formalisms is exceeding their
> value. Our minor releases have more bugs than our patch releases (we
> hope), but every consumer should understand how software versioning
> works. Every device I own has bugs on major OS updates. That doesn't
> imply that every minor release is strictly less stable than a patch
> release, and users need to be warned off it.
> 
> In contrast, we should warn users about features that compromise
> invariants like security or durability, either by design or due to
> their early stage of development. We can't reasonably expect them to
> understand those tradeoffs, since they depend on internal details of
> Hadoop.
> 
> On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
> <vinodkv@apache.org <ma...@apache.org>> wrote:
>> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
> 
> Every project takes these case-by-case. Most of the time we'll
> accommodate the old semantics- and we try to be explicit where we
> promise compatibility- but this isn't a logic problem, it's a
> practical one. If it's an easy fix to an obscure API, we probably
> won't even hear about it.
> 
>> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
> 
> It's uncomfortable to have four active release branches, with 3.1
> coming in early 2018. We all benefit from the shared deployment
> experiences that harden these releases, and fragmentation creates
> incentives to compete for that attention. Rather than tacitly
> scuffling over waning interest in the 2.x series, I'd endorse your
> other thread encouraging consolidation around 3.x.
> 
> To that end, there is no policy or precedent that requires that new
> minor releases be labeled as "alpha". If there is cause to believe
> that 2.9.0 is not ready to release in the stable line, then we
> shouldn't release it. -C
> 
>>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>> 
>>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>>> issue was difficult to track down as it only happens when you use IP for ZK
>>> (works fine with host names) and moreover if ZK and RM are co-located on
>>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>> 
>>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>>> replicate with RC1 is much lower.
>>> 
>>> -Subru/Arun.
>>> 
>>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>>> wrote:
>>> 
>>>> +1 from me too.
>>>> 
>>>> Did the following:
>>>> 1) set up a 9-node cluster;
>>>> 2) ran some Gridmix jobs;
>>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>>> guaranteed and opportunistic containers for each job);
>>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>>> containers.
>>>> 
>>>> All the above worked with no issues.
>>>> 
>>>> Thanks for all the effort guys!
>>>> 
>>>> Konstantinos
>>>> 
>>>> 
>>>> 
>>>> Konstantinos
>>>> 
>>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>>> wrote:
>>>> 
>>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>> 
>>>>> - Verified all hashes and checksums
>>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>>> - Deployed a pseudo cluster
>>>>> - Ran some example jobs
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>> 
>>>>>> Sunil / Rohith,
>>>>>> 
>>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>>> focusedCommentId=16242693&
>>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>>> comment-tabpanel#comment-16242693
>>>>>> 
>>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>>> issue?
>>>>>> 
>>>>>> Thanks,
>>>>>> Wangda
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks for testing Rohith and Sunil
>>>>>>> 
>>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>>> cluster
>>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>>> I've
>>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>>> jira/browse/YARN-7453
>>>>>> 
>>>>>>> JIRA
>>>>>>> with details of testing.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> -Arun/Subru
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>>> this
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> - Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> Hi Subru and Arun.
>>>>>>>>> 
>>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>> 
>>>>>>>>> I installed cluster built from source.
>>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>> 
>>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>>> - Started an HA cluster
>>>>>>>>> - Pushed RM to standby
>>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorServic
>>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
>>>>> orService.java:146)
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894
>>>>>>>>>   )
>>>>>>>>> 
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>       at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:
>>>>>>>>> 949)
>>>>>>>>> 
>>>>>>>>> Will check and post more details,
>>>>>>>>> 
>>>>>>>>> - Sunil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>> 
>>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>>> cluster
>>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>> 
>>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>>> start.
>>>>>>>>>> *Can
>>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>> 
>>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>>> switch
>>>>>>> to
>>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>> 
>>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>>> ActiveStandbyElector:
>>>>>>>>>> Exception handling the winning of election
>>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>>> transition
>>>>>>> to
>>>>>>>>>> Active
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>>> ClientCnxn.
>>>>>>>>> java:498)
>>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>>> when
>>>>>>>>>> transitioning to Active mode
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>>   ... 4 more
>>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode =
>>>>>>>>>> NoAuth
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>>> iceStateException.java:105)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:205)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>>> upInformation.java:1886)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>>   ... 5 more
>>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>>> NoAuthException:
>>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>>   at
>>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>>> KeeperException.java:113)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>>> ZooKeeper.java:949)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>>   at org.apache.curator.RetryLoop.
>>>> callWithRetry(RetryLoop.java:
>>>>>> 107)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:194)
>>>>>>>>>>   ... 13 more
>>>>>>>>>> 
>>>>>>>>>> Thanks & Regards
>>>>>>>>>> Rohith Sharma K S
>>>>>>>>>> 
>>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>>> 2.9
>>>>>>>>> line
>>>>>>>>>> and
>>>>>>>>>>> will be the latest stable/production release for Apache
>>>> Hadoop -
>>>>>> it
>>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>>> 787
>>>>>>> Bug
>>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>> 
>>>>>>>>>>>     More information about the 2.9.0 release plan can be
>>>> found
>>>>>>> here:
>>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>> 
>>>>>>>>>>>     New RC is available at:
>>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>> 
>>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>>> commit
>>>>>>>>> id
>>>>>>>>>> is:
>>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>> 
>>>>>>>>>>>     The maven artifacts are available via
>>>>> repository.apache.org
>>>>>>> at:
>>>>>>>>>>> *
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>> <
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>>> *
>>>>>>>>>>> 
>>>>>>>>>>>     Please try the release and vote; the vote will run for
>>>> the
>>>>>>>>> usual 5
>>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Arun/Subru
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Missed this response on the old thread, but closing the loop here..

The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.

I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.

If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..

+Vinod

> On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
> 
> The labor required for these release formalisms is exceeding their
> value. Our minor releases have more bugs than our patch releases (we
> hope), but every consumer should understand how software versioning
> works. Every device I own has bugs on major OS updates. That doesn't
> imply that every minor release is strictly less stable than a patch
> release, and users need to be warned off it.
> 
> In contrast, we should warn users about features that compromise
> invariants like security or durability, either by design or due to
> their early stage of development. We can't reasonably expect them to
> understand those tradeoffs, since they depend on internal details of
> Hadoop.
> 
> On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
> <vinodkv@apache.org <ma...@apache.org>> wrote:
>> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
> 
> Every project takes these case-by-case. Most of the time we'll
> accommodate the old semantics- and we try to be explicit where we
> promise compatibility- but this isn't a logic problem, it's a
> practical one. If it's an easy fix to an obscure API, we probably
> won't even hear about it.
> 
>> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
> 
> It's uncomfortable to have four active release branches, with 3.1
> coming in early 2018. We all benefit from the shared deployment
> experiences that harden these releases, and fragmentation creates
> incentives to compete for that attention. Rather than tacitly
> scuffling over waning interest in the 2.x series, I'd endorse your
> other thread encouraging consolidation around 3.x.
> 
> To that end, there is no policy or precedent that requires that new
> minor releases be labeled as "alpha". If there is cause to believe
> that 2.9.0 is not ready to release in the stable line, then we
> shouldn't release it. -C
> 
>>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>> 
>>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>>> issue was difficult to track down as it only happens when you use IP for ZK
>>> (works fine with host names) and moreover if ZK and RM are co-located on
>>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>> 
>>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>>> replicate with RC1 is much lower.
>>> 
>>> -Subru/Arun.
>>> 
>>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>>> wrote:
>>> 
>>>> +1 from me too.
>>>> 
>>>> Did the following:
>>>> 1) set up a 9-node cluster;
>>>> 2) ran some Gridmix jobs;
>>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>>> guaranteed and opportunistic containers for each job);
>>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>>> containers.
>>>> 
>>>> All the above worked with no issues.
>>>> 
>>>> Thanks for all the effort guys!
>>>> 
>>>> Konstantinos
>>>> 
>>>> 
>>>> 
>>>> Konstantinos
>>>> 
>>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>>> wrote:
>>>> 
>>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>> 
>>>>> - Verified all hashes and checksums
>>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>>> - Deployed a pseudo cluster
>>>>> - Ran some example jobs
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>> 
>>>>>> Sunil / Rohith,
>>>>>> 
>>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>>> focusedCommentId=16242693&
>>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>>> comment-tabpanel#comment-16242693
>>>>>> 
>>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>>> issue?
>>>>>> 
>>>>>> Thanks,
>>>>>> Wangda
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks for testing Rohith and Sunil
>>>>>>> 
>>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>>> cluster
>>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>>> I've
>>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>>> jira/browse/YARN-7453
>>>>>> 
>>>>>>> JIRA
>>>>>>> with details of testing.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> -Arun/Subru
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>>> this
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> - Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> Hi Subru and Arun.
>>>>>>>>> 
>>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>> 
>>>>>>>>> I installed cluster built from source.
>>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>> 
>>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>>> - Started an HA cluster
>>>>>>>>> - Pushed RM to standby
>>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorServic
>>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
>>>>> orService.java:146)
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894
>>>>>>>>>   )
>>>>>>>>> 
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>       at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:
>>>>>>>>> 949)
>>>>>>>>> 
>>>>>>>>> Will check and post more details,
>>>>>>>>> 
>>>>>>>>> - Sunil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>> 
>>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>>> cluster
>>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>> 
>>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>>> start.
>>>>>>>>>> *Can
>>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>> 
>>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>>> switch
>>>>>>> to
>>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>> 
>>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>>> ActiveStandbyElector:
>>>>>>>>>> Exception handling the winning of election
>>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>>> transition
>>>>>>> to
>>>>>>>>>> Active
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>>> ClientCnxn.
>>>>>>>>> java:498)
>>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>>> when
>>>>>>>>>> transitioning to Active mode
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>>   ... 4 more
>>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode =
>>>>>>>>>> NoAuth
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>>> iceStateException.java:105)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:205)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>>> upInformation.java:1886)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>>   ... 5 more
>>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>>> NoAuthException:
>>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>>   at
>>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>>> KeeperException.java:113)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>>> ZooKeeper.java:949)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>>   at org.apache.curator.RetryLoop.
>>>> callWithRetry(RetryLoop.java:
>>>>>> 107)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:194)
>>>>>>>>>>   ... 13 more
>>>>>>>>>> 
>>>>>>>>>> Thanks & Regards
>>>>>>>>>> Rohith Sharma K S
>>>>>>>>>> 
>>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>>> 2.9
>>>>>>>>> line
>>>>>>>>>> and
>>>>>>>>>>> will be the latest stable/production release for Apache
>>>> Hadoop -
>>>>>> it
>>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>>> 787
>>>>>>> Bug
>>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>> 
>>>>>>>>>>>     More information about the 2.9.0 release plan can be
>>>> found
>>>>>>> here:
>>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>> 
>>>>>>>>>>>     New RC is available at:
>>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>> 
>>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>>> commit
>>>>>>>>> id
>>>>>>>>>> is:
>>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>> 
>>>>>>>>>>>     The maven artifacts are available via
>>>>> repository.apache.org
>>>>>>> at:
>>>>>>>>>>> *
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>> <
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>>> *
>>>>>>>>>>> 
>>>>>>>>>>>     Please try the release and vote; the vote will run for
>>>> the
>>>>>>>>> usual 5
>>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Arun/Subru
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Missed this response on the old thread, but closing the loop here..

The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.

I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.

If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..

+Vinod

> On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
> 
> The labor required for these release formalisms is exceeding their
> value. Our minor releases have more bugs than our patch releases (we
> hope), but every consumer should understand how software versioning
> works. Every device I own has bugs on major OS updates. That doesn't
> imply that every minor release is strictly less stable than a patch
> release, and users need to be warned off it.
> 
> In contrast, we should warn users about features that compromise
> invariants like security or durability, either by design or due to
> their early stage of development. We can't reasonably expect them to
> understand those tradeoffs, since they depend on internal details of
> Hadoop.
> 
> On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
> <vinodkv@apache.org <ma...@apache.org>> wrote:
>> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
> 
> Every project takes these case-by-case. Most of the time we'll
> accommodate the old semantics- and we try to be explicit where we
> promise compatibility- but this isn't a logic problem, it's a
> practical one. If it's an easy fix to an obscure API, we probably
> won't even hear about it.
> 
>> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
> 
> It's uncomfortable to have four active release branches, with 3.1
> coming in early 2018. We all benefit from the shared deployment
> experiences that harden these releases, and fragmentation creates
> incentives to compete for that attention. Rather than tacitly
> scuffling over waning interest in the 2.x series, I'd endorse your
> other thread encouraging consolidation around 3.x.
> 
> To that end, there is no policy or precedent that requires that new
> minor releases be labeled as "alpha". If there is cause to believe
> that 2.9.0 is not ready to release in the stable line, then we
> shouldn't release it. -C
> 
>>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>> 
>>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>>> issue was difficult to track down as it only happens when you use IP for ZK
>>> (works fine with host names) and moreover if ZK and RM are co-located on
>>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>> 
>>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>>> replicate with RC1 is much lower.
>>> 
>>> -Subru/Arun.
>>> 
>>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>>> wrote:
>>> 
>>>> +1 from me too.
>>>> 
>>>> Did the following:
>>>> 1) set up a 9-node cluster;
>>>> 2) ran some Gridmix jobs;
>>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>>> guaranteed and opportunistic containers for each job);
>>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>>> containers.
>>>> 
>>>> All the above worked with no issues.
>>>> 
>>>> Thanks for all the effort guys!
>>>> 
>>>> Konstantinos
>>>> 
>>>> 
>>>> 
>>>> Konstantinos
>>>> 
>>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>>> wrote:
>>>> 
>>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>> 
>>>>> - Verified all hashes and checksums
>>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>>> - Deployed a pseudo cluster
>>>>> - Ran some example jobs
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>> 
>>>>>> Sunil / Rohith,
>>>>>> 
>>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>>> focusedCommentId=16242693&
>>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>>> comment-tabpanel#comment-16242693
>>>>>> 
>>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>>> issue?
>>>>>> 
>>>>>> Thanks,
>>>>>> Wangda
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks for testing Rohith and Sunil
>>>>>>> 
>>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>>> cluster
>>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>>> I've
>>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>>> jira/browse/YARN-7453
>>>>>> 
>>>>>>> JIRA
>>>>>>> with details of testing.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> -Arun/Subru
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>>> this
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> - Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> Hi Subru and Arun.
>>>>>>>>> 
>>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>> 
>>>>>>>>> I installed cluster built from source.
>>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>> 
>>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>>> - Started an HA cluster
>>>>>>>>> - Pushed RM to standby
>>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorServic
>>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
>>>>> orService.java:146)
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894
>>>>>>>>>   )
>>>>>>>>> 
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>       at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:
>>>>>>>>> 949)
>>>>>>>>> 
>>>>>>>>> Will check and post more details,
>>>>>>>>> 
>>>>>>>>> - Sunil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>> 
>>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>>> cluster
>>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>> 
>>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>>> start.
>>>>>>>>>> *Can
>>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>> 
>>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>>> switch
>>>>>>> to
>>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>> 
>>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>>> ActiveStandbyElector:
>>>>>>>>>> Exception handling the winning of election
>>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>>> transition
>>>>>>> to
>>>>>>>>>> Active
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>>> ClientCnxn.
>>>>>>>>> java:498)
>>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>>> when
>>>>>>>>>> transitioning to Active mode
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>>   ... 4 more
>>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode =
>>>>>>>>>> NoAuth
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>>> iceStateException.java:105)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:205)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>>> upInformation.java:1886)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>>   ... 5 more
>>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>>> NoAuthException:
>>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>>   at
>>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>>> KeeperException.java:113)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>>> ZooKeeper.java:949)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>>   at org.apache.curator.RetryLoop.
>>>> callWithRetry(RetryLoop.java:
>>>>>> 107)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:194)
>>>>>>>>>>   ... 13 more
>>>>>>>>>> 
>>>>>>>>>> Thanks & Regards
>>>>>>>>>> Rohith Sharma K S
>>>>>>>>>> 
>>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>>> 2.9
>>>>>>>>> line
>>>>>>>>>> and
>>>>>>>>>>> will be the latest stable/production release for Apache
>>>> Hadoop -
>>>>>> it
>>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>>> 787
>>>>>>> Bug
>>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>> 
>>>>>>>>>>>     More information about the 2.9.0 release plan can be
>>>> found
>>>>>>> here:
>>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>> 
>>>>>>>>>>>     New RC is available at:
>>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>> 
>>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>>> commit
>>>>>>>>> id
>>>>>>>>>> is:
>>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>> 
>>>>>>>>>>>     The maven artifacts are available via
>>>>> repository.apache.org
>>>>>>> at:
>>>>>>>>>>> *
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>> <
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>>> *
>>>>>>>>>>> 
>>>>>>>>>>>     Please try the release and vote; the vote will run for
>>>> the
>>>>>>>>> usual 5
>>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Arun/Subru
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Missed this response on the old thread, but closing the loop here..

The incompatibility conundrum with Dot-zeroes did indeed happen, in early 2.x releases - multiple times at that. And the downstream projects did raise concerns at these unfixable situations.

I wasn't advocating a new formalism, this was more of a lesson taken from real life experience that I wanted share with fellow RMs - as IMO the effort was worth the value for the releases where I used it.

If RMs of these more recent releases choose to not do this if it is perceived that a release won't run into those past issues at all, it's clearly their call. It's just that we are bound to potentially make the same mistakes and learn the same lesson all over again..

+Vinod

> On Nov 9, 2017, at 9:51 AM, Chris Douglas <cd...@apache.org> wrote:
> 
> The labor required for these release formalisms is exceeding their
> value. Our minor releases have more bugs than our patch releases (we
> hope), but every consumer should understand how software versioning
> works. Every device I own has bugs on major OS updates. That doesn't
> imply that every minor release is strictly less stable than a patch
> release, and users need to be warned off it.
> 
> In contrast, we should warn users about features that compromise
> invariants like security or durability, either by design or due to
> their early stage of development. We can't reasonably expect them to
> understand those tradeoffs, since they depend on internal details of
> Hadoop.
> 
> On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
> <vinodkv@apache.org <ma...@apache.org>> wrote:
>> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?
> 
> Every project takes these case-by-case. Most of the time we'll
> accommodate the old semantics- and we try to be explicit where we
> promise compatibility- but this isn't a logic problem, it's a
> practical one. If it's an easy fix to an obscure API, we probably
> won't even hear about it.
> 
>> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.
> 
> It's uncomfortable to have four active release branches, with 3.1
> coming in early 2018. We all benefit from the shared deployment
> experiences that harden these releases, and fragmentation creates
> incentives to compete for that attention. Rather than tacitly
> scuffling over waning interest in the 2.x series, I'd endorse your
> other thread encouraging consolidation around 3.x.
> 
> To that end, there is no policy or precedent that requires that new
> minor releases be labeled as "alpha". If there is cause to believe
> that 2.9.0 is not ready to release in the stable line, then we
> shouldn't release it. -C
> 
>>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>> 
>>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>>> issue was difficult to track down as it only happens when you use IP for ZK
>>> (works fine with host names) and moreover if ZK and RM are co-located on
>>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>> 
>>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>>> replicate with RC1 is much lower.
>>> 
>>> -Subru/Arun.
>>> 
>>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>>> wrote:
>>> 
>>>> +1 from me too.
>>>> 
>>>> Did the following:
>>>> 1) set up a 9-node cluster;
>>>> 2) ran some Gridmix jobs;
>>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>>> guaranteed and opportunistic containers for each job);
>>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>>> containers.
>>>> 
>>>> All the above worked with no issues.
>>>> 
>>>> Thanks for all the effort guys!
>>>> 
>>>> Konstantinos
>>>> 
>>>> 
>>>> 
>>>> Konstantinos
>>>> 
>>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>>> wrote:
>>>> 
>>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>> 
>>>>> - Verified all hashes and checksums
>>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>>> - Deployed a pseudo cluster
>>>>> - Ran some example jobs
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>> 
>>>>>> Sunil / Rohith,
>>>>>> 
>>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>>> focusedCommentId=16242693&
>>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>>> comment-tabpanel#comment-16242693
>>>>>> 
>>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>>> issue?
>>>>>> 
>>>>>> Thanks,
>>>>>> Wangda
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks for testing Rohith and Sunil
>>>>>>> 
>>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>>> cluster
>>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>>> I've
>>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>>> jira/browse/YARN-7453
>>>>>> 
>>>>>>> JIRA
>>>>>>> with details of testing.
>>>>>>> 
>>>>>>> Cheers
>>>>>>> -Arun/Subru
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>>> this
>>>>>>>> issue.
>>>>>>>> 
>>>>>>>> - Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>> Hi Subru and Arun.
>>>>>>>>> 
>>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>> 
>>>>>>>>> I installed cluster built from source.
>>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>> 
>>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>>> - Started an HA cluster
>>>>>>>>> - Pushed RM to standby
>>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorServic
>>>>>>>>>   e.becomeActive(ActiveStandbyElectorBasedElect
>>>>> orService.java:146)
>>>>>>>>>       at
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894
>>>>>>>>>   )
>>>>>>>>> 
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>       at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>       at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:
>>>>>>>>> 949)
>>>>>>>>> 
>>>>>>>>> Will check and post more details,
>>>>>>>>> 
>>>>>>>>> - Sunil
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>> 
>>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>>> cluster
>>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>> 
>>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>>> start.
>>>>>>>>>> *Can
>>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>> 
>>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>>> switch
>>>>>>> to
>>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>> 
>>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>>> ActiveStandbyElector:
>>>>>>>>>> Exception handling the winning of election
>>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>>> transition
>>>>>>> to
>>>>>>>>>> Active
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>>   at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>>> ClientCnxn.
>>>>>>>>> java:498)
>>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>>> when
>>>>>>>>>> transitioning to Active mode
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>>   ... 4 more
>>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode =
>>>>>>>>>> NoAuth
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>>> iceStateException.java:105)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:205)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>>   at java.security.AccessController.doPrivileged(Native
>>>> Method)
>>>>>>>>>>   at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>>> upInformation.java:1886)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>>   ... 5 more
>>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>>> NoAuthException:
>>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>>   at
>>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>>> KeeperException.java:113)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>>> ZooKeeper.java:949)
>>>>>>>>>>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>>   at org.apache.curator.RetryLoop.
>>>> callWithRetry(RetryLoop.java:
>>>>>> 107)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>>   at
>>>>>>>>>> 
>>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>>> ice.java:194)
>>>>>>>>>>   ... 13 more
>>>>>>>>>> 
>>>>>>>>>> Thanks & Regards
>>>>>>>>>> Rohith Sharma K S
>>>>>>>>>> 
>>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi folks,
>>>>>>>>>>> 
>>>>>>>>>>>    Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>>> 2.9
>>>>>>>>> line
>>>>>>>>>> and
>>>>>>>>>>> will be the latest stable/production release for Apache
>>>> Hadoop -
>>>>>> it
>>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>>> 787
>>>>>>> Bug
>>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>> 
>>>>>>>>>>>     More information about the 2.9.0 release plan can be
>>>> found
>>>>>>> here:
>>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>> 
>>>>>>>>>>>     New RC is available at:
>>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>> 
>>>>>>>>>>>     The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>>> commit
>>>>>>>>> id
>>>>>>>>>> is:
>>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>> 
>>>>>>>>>>>     The maven artifacts are available via
>>>>> repository.apache.org
>>>>>>> at:
>>>>>>>>>>> *
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>> <
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>> hadoop-1065/
>>>>>>>>>>>> *
>>>>>>>>>>> 
>>>>>>>>>>>     Please try the release and vote; the vote will run for
>>>> the
>>>>>>>>> usual 5
>>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Arun/Subru
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org <ma...@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org <ma...@hadoop.apache.org>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Chris Douglas <cd...@apache.org>.
The labor required for these release formalisms is exceeding their
value. Our minor releases have more bugs than our patch releases (we
hope), but every consumer should understand how software versioning
works. Every device I own has bugs on major OS updates. That doesn't
imply that every minor release is strictly less stable than a patch
release, and users need to be warned off it.

In contrast, we should warn users about features that compromise
invariants like security or durability, either by design or due to
their early stage of development. We can't reasonably expect them to
understand those tradeoffs, since they depend on internal details of
Hadoop.

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
<vi...@apache.org> wrote:
> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?

Every project takes these case-by-case. Most of the time we'll
accommodate the old semantics- and we try to be explicit where we
promise compatibility- but this isn't a logic problem, it's a
practical one. If it's an easy fix to an obscure API, we probably
won't even hear about it.

> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

It's uncomfortable to have four active release branches, with 3.1
coming in early 2018. We all benefit from the shared deployment
experiences that harden these releases, and fragmentation creates
incentives to compete for that attention. Rather than tacitly
scuffling over waning interest in the 2.x series, I'd endorse your
other thread encouraging consolidation around 3.x.

To that end, there is no policy or precedent that requires that new
minor releases be labeled as "alpha". If there is cause to believe
that 2.9.0 is not ready to release in the stable line, then we
shouldn't release it. -C

>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>
>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>> issue was difficult to track down as it only happens when you use IP for ZK
>> (works fine with host names) and moreover if ZK and RM are co-located on
>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>
>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>> replicate with RC1 is much lower.
>>
>> -Subru/Arun.
>>
>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>> wrote:
>>
>>> +1 from me too.
>>>
>>> Did the following:
>>> 1) set up a 9-node cluster;
>>> 2) ran some Gridmix jobs;
>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>> guaranteed and opportunistic containers for each job);
>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>> containers.
>>>
>>> All the above worked with no issues.
>>>
>>> Thanks for all the effort guys!
>>>
>>> Konstantinos
>>>
>>>
>>>
>>> Konstantinos
>>>
>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>> wrote:
>>>
>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>
>>>> - Verified all hashes and checksums
>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>> - Deployed a pseudo cluster
>>>> - Ran some example jobs
>>>>
>>>> Thanks,
>>>>
>>>> Eric
>>>>
>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>
>>>>> Sunil / Rohith,
>>>>>
>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>> focusedCommentId=16242693&
>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>> comment-tabpanel#comment-16242693
>>>>>
>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>> issue?
>>>>>
>>>>> Thanks,
>>>>> Wangda
>>>>>
>>>>>
>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>> wrote:
>>>>>
>>>>>> Thanks for testing Rohith and Sunil
>>>>>>
>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>> cluster
>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>> I've
>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>> jira/browse/YARN-7453
>>>>>
>>>>>> JIRA
>>>>>> with details of testing.
>>>>>>
>>>>>> Cheers
>>>>>> -Arun/Subru
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>> rohithsharmaks@apache.org
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>> this
>>>>>>> issue.
>>>>>>>
>>>>>>> - Rohith Sharma K S
>>>>>>>
>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Subru and Arun.
>>>>>>>>
>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>
>>>>>>>> I installed cluster built from source.
>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>
>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>> - Started an HA cluster
>>>>>>>> - Pushed RM to standby
>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorServic
>>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>>> orService.java:146)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894
>>>>>>>>    )
>>>>>>>>
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>        at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:
>>>>>>>> 949)
>>>>>>>>
>>>>>>>> Will check and post more details,
>>>>>>>>
>>>>>>>> - Sunil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>
>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>> cluster
>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>
>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>> start.
>>>>>>>>> *Can
>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>
>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>> switch
>>>>>> to
>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>
>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>> ActiveStandbyElector:
>>>>>>>>> Exception handling the winning of election
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>> ClientCnxn.
>>>>>>>> java:498)
>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>> when
>>>>>>>>> transitioning to Active mode
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>    ... 4 more
>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>> KeeperErrorCode =
>>>>>>>>> NoAuth
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>> iceStateException.java:105)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:205)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>> upInformation.java:1886)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>    ... 5 more
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>> NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>    at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:949)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>    at org.apache.curator.RetryLoop.
>>> callWithRetry(RetryLoop.java:
>>>>> 107)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:194)
>>>>>>>>>    ... 13 more
>>>>>>>>>
>>>>>>>>> Thanks & Regards
>>>>>>>>> Rohith Sharma K S
>>>>>>>>>
>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>> 2.9
>>>>>>>> line
>>>>>>>>> and
>>>>>>>>>> will be the latest stable/production release for Apache
>>> Hadoop -
>>>>> it
>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>> 787
>>>>>> Bug
>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>
>>>>>>>>>>      More information about the 2.9.0 release plan can be
>>> found
>>>>>> here:
>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>
>>>>>>>>>>      New RC is available at:
>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>
>>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>> commit
>>>>>>>> id
>>>>>>>>> is:
>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>
>>>>>>>>>>      The maven artifacts are available via
>>>> repository.apache.org
>>>>>> at:
>>>>>>>>>> *
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>> <
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>      Please try the release and vote; the vote will run for
>>> the
>>>>>>>> usual 5
>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Arun/Subru
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Chris Douglas <cd...@apache.org>.
The labor required for these release formalisms is exceeding their
value. Our minor releases have more bugs than our patch releases (we
hope), but every consumer should understand how software versioning
works. Every device I own has bugs on major OS updates. That doesn't
imply that every minor release is strictly less stable than a patch
release, and users need to be warned off it.

In contrast, we should warn users about features that compromise
invariants like security or durability, either by design or due to
their early stage of development. We can't reasonably expect them to
understand those tradeoffs, since they depend on internal details of
Hadoop.

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
<vi...@apache.org> wrote:
> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?

Every project takes these case-by-case. Most of the time we'll
accommodate the old semantics- and we try to be explicit where we
promise compatibility- but this isn't a logic problem, it's a
practical one. If it's an easy fix to an obscure API, we probably
won't even hear about it.

> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

It's uncomfortable to have four active release branches, with 3.1
coming in early 2018. We all benefit from the shared deployment
experiences that harden these releases, and fragmentation creates
incentives to compete for that attention. Rather than tacitly
scuffling over waning interest in the 2.x series, I'd endorse your
other thread encouraging consolidation around 3.x.

To that end, there is no policy or precedent that requires that new
minor releases be labeled as "alpha". If there is cause to believe
that 2.9.0 is not ready to release in the stable line, then we
shouldn't release it. -C

>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>
>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>> issue was difficult to track down as it only happens when you use IP for ZK
>> (works fine with host names) and moreover if ZK and RM are co-located on
>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>
>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>> replicate with RC1 is much lower.
>>
>> -Subru/Arun.
>>
>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>> wrote:
>>
>>> +1 from me too.
>>>
>>> Did the following:
>>> 1) set up a 9-node cluster;
>>> 2) ran some Gridmix jobs;
>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>> guaranteed and opportunistic containers for each job);
>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>> containers.
>>>
>>> All the above worked with no issues.
>>>
>>> Thanks for all the effort guys!
>>>
>>> Konstantinos
>>>
>>>
>>>
>>> Konstantinos
>>>
>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>> wrote:
>>>
>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>
>>>> - Verified all hashes and checksums
>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>> - Deployed a pseudo cluster
>>>> - Ran some example jobs
>>>>
>>>> Thanks,
>>>>
>>>> Eric
>>>>
>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>
>>>>> Sunil / Rohith,
>>>>>
>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>> focusedCommentId=16242693&
>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>> comment-tabpanel#comment-16242693
>>>>>
>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>> issue?
>>>>>
>>>>> Thanks,
>>>>> Wangda
>>>>>
>>>>>
>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>> wrote:
>>>>>
>>>>>> Thanks for testing Rohith and Sunil
>>>>>>
>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>> cluster
>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>> I've
>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>> jira/browse/YARN-7453
>>>>>
>>>>>> JIRA
>>>>>> with details of testing.
>>>>>>
>>>>>> Cheers
>>>>>> -Arun/Subru
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>> rohithsharmaks@apache.org
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>> this
>>>>>>> issue.
>>>>>>>
>>>>>>> - Rohith Sharma K S
>>>>>>>
>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Subru and Arun.
>>>>>>>>
>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>
>>>>>>>> I installed cluster built from source.
>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>
>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>> - Started an HA cluster
>>>>>>>> - Pushed RM to standby
>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorServic
>>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>>> orService.java:146)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894
>>>>>>>>    )
>>>>>>>>
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>        at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:
>>>>>>>> 949)
>>>>>>>>
>>>>>>>> Will check and post more details,
>>>>>>>>
>>>>>>>> - Sunil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>
>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>> cluster
>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>
>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>> start.
>>>>>>>>> *Can
>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>
>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>> switch
>>>>>> to
>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>
>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>> ActiveStandbyElector:
>>>>>>>>> Exception handling the winning of election
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>> ClientCnxn.
>>>>>>>> java:498)
>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>> when
>>>>>>>>> transitioning to Active mode
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>    ... 4 more
>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>> KeeperErrorCode =
>>>>>>>>> NoAuth
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>> iceStateException.java:105)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:205)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>> upInformation.java:1886)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>    ... 5 more
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>> NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>    at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:949)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>    at org.apache.curator.RetryLoop.
>>> callWithRetry(RetryLoop.java:
>>>>> 107)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:194)
>>>>>>>>>    ... 13 more
>>>>>>>>>
>>>>>>>>> Thanks & Regards
>>>>>>>>> Rohith Sharma K S
>>>>>>>>>
>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>> 2.9
>>>>>>>> line
>>>>>>>>> and
>>>>>>>>>> will be the latest stable/production release for Apache
>>> Hadoop -
>>>>> it
>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>> 787
>>>>>> Bug
>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>
>>>>>>>>>>      More information about the 2.9.0 release plan can be
>>> found
>>>>>> here:
>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>
>>>>>>>>>>      New RC is available at:
>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>
>>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>> commit
>>>>>>>> id
>>>>>>>>> is:
>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>
>>>>>>>>>>      The maven artifacts are available via
>>>> repository.apache.org
>>>>>> at:
>>>>>>>>>> *
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>> <
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>      Please try the release and vote; the vote will run for
>>> the
>>>>>>>> usual 5
>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Arun/Subru
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkaranasos@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >> I've
> >>>>> updated the YARN-7453 <https://issues.apache.org/
> >> jira/browse/YARN-7453
> >>>>
> >>>>> JIRA
> >>>>> with details of testing.
> >>>>>
> >>>>> Cheers
> >>>>> -Arun/Subru
> >>>>>
> >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> >>>>> rohithsharmaks@apache.org
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> >> this
> >>>>>> issue.
> >>>>>>
> >>>>>> - Rohith Sharma K S
> >>>>>>
> >>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >>>>>>
> >>>>>>> Hi Subru and Arun.
> >>>>>>>
> >>>>>>> Thanks for driving 2.9 release. Great work!
> >>>>>>>
> >>>>>>> I installed cluster built from source.
> >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
> >>>>>>> - Accessed new UI and it also seems fine.
> >>>>>>>
> >>>>>>> However I am also getting same issue as Rohith reported.
> >>>>>>> - Started an HA cluster
> >>>>>>> - Pushed RM to standby
> >>>>>>> - Pushed back RM to active then seeing an exception.
> >>>>>>>
> >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>> transition
> >>>> to
> >>>>>>> Active
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorServic
> >>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
> >>> orService.java:146)
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894
> >>>>>>>    )
> >>>>>>>
> >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>        at
> >>>>>>> org.apache.zookeeper.KeeperException.create(
> >>> KeeperException.java:113)
> >>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>> ZooKeeper.java:
> >>>>>>> 949)
> >>>>>>>
> >>>>>>> Will check and post more details,
> >>>>>>>
> >>>>>>> - Sunil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >>>>>>> rohithsharmaks@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Subru/Arun for the great work!
> >>>>>>>>
> >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
> >>>>> cluster
> >>>>>>>> along with new YARN UI and ATSv2.
> >>>>>>>>
> >>>>>>>> I am facing basic RM HA switch issue after first time successful
> >>>>> start.
> >>>>>>>> *Can
> >>>>>>>> anyone else is facing this issue?*
> >>>>>>>>
> >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> >>>> switch
> >>>>> to
> >>>>>>>> active successfully. Exception trace I see from the log is
> >>>>>>>>
> >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> >>>>> ActiveStandbyElector:
> >>>>>>>> Exception handling the winning of election
> >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>>> transition
> >>>>> to
> >>>>>>>> Active
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:146)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >>>>>>> veStandbyElector.java:473)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >>>>>>> ClientCnxn.java:599)
> >>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
> >>> ClientCnxn.
> >>>>>>> java:498)
> >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> >>> when
> >>>>>>>> transitioning to Active mode
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:325)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:144)
> >>>>>>>>    ... 4 more
> >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
> >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>> KeeperErrorCode =
> >>>>>>>> NoAuth
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
> >>>>>>> iceStateException.java:105)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:205)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.startActiveServices(ResourceManager.java:1131)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1171)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1167)
> >>>>>>>>    at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >>>>>>> upInformation.java:1886)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.transitionToActive(ResourceManager.java:1167)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:320)
> >>>>>>>>    ... 5 more
> >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
> >> NoAuthException:
> >>>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>>    at
> >>>>>>>> org.apache.zookeeper.KeeperException.create(
> >>>> KeeperException.java:113)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>>> ZooKeeper.java:949)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >>>>>>> peration(CuratorTransactionImpl.java:159)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >>>>>>> ess$200(CuratorTransactionImpl.java:44)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:129)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:125)
> >>>>>>>>    at org.apache.curator.RetryLoop.
> >> callWithRetry(RetryLoop.java:
> >>>> 107)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >>>>>>> mit(CuratorTransactionImpl.java:122)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >>>>>>> ion.commit(ZKCuratorManager.java:403)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >>>>>>> ZKCuratorManager.java:372)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:194)
> >>>>>>>>    ... 13 more
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Rohith Sharma K S
> >>>>>>>>
> >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
> >>> 2.9
> >>>>>>> line
> >>>>>>>> and
> >>>>>>>>> will be the latest stable/production release for Apache
> >> Hadoop -
> >>>> it
> >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
> >>> 787
> >>>>> Bug
> >>>>>>>>> fixes new fixed issues since 2.8.2 .
> >>>>>>>>>
> >>>>>>>>>      More information about the 2.9.0 release plan can be
> >> found
> >>>>> here:
> >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9>*
> >>>>>>>>>
> >>>>>>>>>      New RC is available at:
> >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>>>>>>>>
> >>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
> >>>> commit
> >>>>>>> id
> >>>>>>>> is:
> >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>>>>>>>>
> >>>>>>>>>      The maven artifacts are available via
> >>> repository.apache.org
> >>>>> at:
> >>>>>>>>> *
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>> <
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>>>>      Please try the release and vote; the vote will run for
> >> the
> >>>>>>> usual 5
> >>>>>>>>> days, ending on 11/10/2017 4pm PST time.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Arun/Subru
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
Thanks Vinod for your feedback, we'll incorporate it when we spin RC1.

-Subru/Arun

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> A related point - I thought I mentioned this in one of the release
> preparation threads, but in any case.
>
> Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to
> the voting thread as well as the final release) that the first release can
> potentially go through additional fixes to incompatible changes (besides
> stabilization fixes). We should do this with 2.9.0 too.
>
> This has some history - long before this, we tried two different things:
> (a) downstream projects consume an RC (b) downstream projects consume a
> release. Option (a) was tried many times but it was increasingly getting
> hard to manage this across all the projects that depend on Hadoop. When we
> tried option (b), we used to make .0 as a GA release, but downstream
> projects like Tez, Hive, Spark would come back and find an incompatible
> change - and now we were forced into a conundrum - is fixing this
> incompatible change itself an incompatibility? So to avoid this problem,
> we've started marking the first few releases as alpha eventually making a
> stable point release. Clearly, specific users can still use this in
> production as long as we the Hadoop community reserve the right to fix
> incompatibilities.
>
> Long story short, I'd just add to your voting thread and release notes
> that 2.9.0 still needs to be tested downstream and so users may want to
> wait for subsequent point releases.
>
> Thanks
> +Vinod
>
> > On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> >
> > We are canceling the RC due to the issue that Rohith/Sunil identified.
> The
> > issue was difficult to track down as it only happens when you use IP for
> ZK
> > (works fine with host names) and moreover if ZK and RM are co-located on
> > same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> >
> > Thanks to everyone for the extensive testing/validation. Hopefully cost
> to
> > replicate with RC1 is much lower.
> >
> > -Subru/Arun.
> >
> > On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <
> kkaranasos@gmail.com
> >> wrote:
> >
> >> +1 from me too.
> >>
> >> Did the following:
> >> 1) set up a 9-node cluster;
> >> 2) ran some Gridmix jobs;
> >> 3) ran (2) after enabling opportunistic containers (used a mix of
> >> guaranteed and opportunistic containers for each job);
> >> 4) ran (3) but this time enabling distributed scheduling of
> opportunistic
> >> containers.
> >>
> >> All the above worked with no issues.
> >>
> >> Thanks for all the effort guys!
> >>
> >> Konstantinos
> >>
> >>
> >>
> >> Konstantinos
> >>
> >> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> >> wrote:
> >>
> >>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >>>
> >>> - Verified all hashes and checksums
> >>> - Built from source on macOS 10.12.6, Java 1.8.0u65
> >>> - Deployed a pseudo cluster
> >>> - Ran some example jobs
> >>>
> >>> Thanks,
> >>>
> >>> Eric
> >>>
> >>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com>
> wrote:
> >>>
> >>>> Sunil / Rohith,
> >>>>
> >>>> Could you check if your configs are same as Jonathan posted configs?
> >>>> https://issues.apache.org/jira/browse/YARN-7453?
> >>> focusedCommentId=16242693&
> >>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
> >>>> comment-tabpanel#comment-16242693
> >>>>
> >>>> And could you try if using Jonathan's configs can still reproduce the
> >>>> issue?
> >>>>
> >>>> Thanks,
> >>>> Wangda
> >>>>
> >>>>
> >>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for testing Rohith and Sunil
> >>>>>
> >>>>> Can you please confirm if it is not a config issue at your end ?
> >>>>> We (both Jonathan and myself) just tried testing this on a fresh
> >>> cluster
> >>>>> (both automatic and manual) and we are not able to reproduce this.
> >> I've
> >>>>> updated the YARN-7453 <https://issues.apache.org/
> >> jira/browse/YARN-7453
> >>>>
> >>>>> JIRA
> >>>>> with details of testing.
> >>>>>
> >>>>> Cheers
> >>>>> -Arun/Subru
> >>>>>
> >>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> >>>>> rohithsharmaks@apache.org
> >>>>>> wrote:
> >>>>>
> >>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> >>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> >> this
> >>>>>> issue.
> >>>>>>
> >>>>>> - Rohith Sharma K S
> >>>>>>
> >>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >>>>>>
> >>>>>>> Hi Subru and Arun.
> >>>>>>>
> >>>>>>> Thanks for driving 2.9 release. Great work!
> >>>>>>>
> >>>>>>> I installed cluster built from source.
> >>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
> >>>>>>> - Accessed new UI and it also seems fine.
> >>>>>>>
> >>>>>>> However I am also getting same issue as Rohith reported.
> >>>>>>> - Started an HA cluster
> >>>>>>> - Pushed RM to standby
> >>>>>>> - Pushed back RM to active then seeing an exception.
> >>>>>>>
> >>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>> transition
> >>>> to
> >>>>>>> Active
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorServic
> >>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
> >>> orService.java:146)
> >>>>>>>        at
> >>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894
> >>>>>>>    )
> >>>>>>>
> >>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>        at
> >>>>>>> org.apache.zookeeper.KeeperException.create(
> >>> KeeperException.java:113)
> >>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>> ZooKeeper.java:
> >>>>>>> 949)
> >>>>>>>
> >>>>>>> Will check and post more details,
> >>>>>>>
> >>>>>>> - Sunil
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >>>>>>> rohithsharmaks@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Subru/Arun for the great work!
> >>>>>>>>
> >>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
> >>>>> cluster
> >>>>>>>> along with new YARN UI and ATSv2.
> >>>>>>>>
> >>>>>>>> I am facing basic RM HA switch issue after first time successful
> >>>>> start.
> >>>>>>>> *Can
> >>>>>>>> anyone else is facing this issue?*
> >>>>>>>>
> >>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> >>>> switch
> >>>>> to
> >>>>>>>> active successfully. Exception trace I see from the log is
> >>>>>>>>
> >>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> >>>>> ActiveStandbyElector:
> >>>>>>>> Exception handling the winning of election
> >>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
> >>>> transition
> >>>>> to
> >>>>>>>> Active
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:146)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >>>>>>> eStandbyElector.java:894)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >>>>>>> veStandbyElector.java:473)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >>>>>>> ClientCnxn.java:599)
> >>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
> >>> ClientCnxn.
> >>>>>>> java:498)
> >>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> >>> when
> >>>>>>>> transitioning to Active mode
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:325)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >>>>>>> torBasedElectorService.java:144)
> >>>>>>>>    ... 4 more
> >>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
> >>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
> >>>>> KeeperErrorCode =
> >>>>>>>> NoAuth
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
> >>>>>>> iceStateException.java:105)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:205)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.startActiveServices(ResourceManager.java:1131)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1171)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$1.run(ResourceManager.java:1167)
> >>>>>>>>    at java.security.AccessController.doPrivileged(Native
> >> Method)
> >>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >>>>>>> upInformation.java:1886)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r.transitionToActive(ResourceManager.java:1167)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >>>>>>> ransitionToActive(AdminService.java:320)
> >>>>>>>>    ... 5 more
> >>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
> >> NoAuthException:
> >>>>>>>> KeeperErrorCode = NoAuth
> >>>>>>>>    at
> >>>>>>>> org.apache.zookeeper.KeeperException.create(
> >>>> KeeperException.java:113)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
> >>>>> ZooKeeper.java:949)
> >>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >>>>>>> peration(CuratorTransactionImpl.java:159)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >>>>>>> ess$200(CuratorTransactionImpl.java:44)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:129)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >>>>>>> all(CuratorTransactionImpl.java:125)
> >>>>>>>>    at org.apache.curator.RetryLoop.
> >> callWithRetry(RetryLoop.java:
> >>>> 107)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >>>>>>> mit(CuratorTransactionImpl.java:122)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >>>>>>> ion.commit(ZKCuratorManager.java:403)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >>>>>>> ZKCuratorManager.java:372)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >>>>>>>>    at
> >>>>>>>>
> >>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >>>>>>>>    at
> >>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
> >>>>>>> ice.java:194)
> >>>>>>>>    ... 13 more
> >>>>>>>>
> >>>>>>>> Thanks & Regards
> >>>>>>>> Rohith Sharma K S
> >>>>>>>>
> >>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
> >>> 2.9
> >>>>>>> line
> >>>>>>>> and
> >>>>>>>>> will be the latest stable/production release for Apache
> >> Hadoop -
> >>>> it
> >>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
> >>> 787
> >>>>> Bug
> >>>>>>>>> fixes new fixed issues since 2.8.2 .
> >>>>>>>>>
> >>>>>>>>>      More information about the 2.9.0 release plan can be
> >> found
> >>>>> here:
> >>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
> >>>>>>>>> Roadmap#Roadmap-Version2.9>*
> >>>>>>>>>
> >>>>>>>>>      New RC is available at:
> >>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>>>>>>>>
> >>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
> >>>> commit
> >>>>>>> id
> >>>>>>>> is:
> >>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>>>>>>>>
> >>>>>>>>>      The maven artifacts are available via
> >>> repository.apache.org
> >>>>> at:
> >>>>>>>>> *
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>> <
> >>>>>>>> https://repository.apache.org/content/repositories/orgapache
> >>>>>>> hadoop-1065/
> >>>>>>>>>> *
> >>>>>>>>>
> >>>>>>>>>      Please try the release and vote; the vote will run for
> >> the
> >>>>>>> usual 5
> >>>>>>>>> days, ending on 11/10/2017 4pm PST time.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Arun/Subru
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Chris Douglas <cd...@apache.org>.
The labor required for these release formalisms is exceeding their
value. Our minor releases have more bugs than our patch releases (we
hope), but every consumer should understand how software versioning
works. Every device I own has bugs on major OS updates. That doesn't
imply that every minor release is strictly less stable than a patch
release, and users need to be warned off it.

In contrast, we should warn users about features that compromise
invariants like security or durability, either by design or due to
their early stage of development. We can't reasonably expect them to
understand those tradeoffs, since they depend on internal details of
Hadoop.

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
<vi...@apache.org> wrote:
> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?

Every project takes these case-by-case. Most of the time we'll
accommodate the old semantics- and we try to be explicit where we
promise compatibility- but this isn't a logic problem, it's a
practical one. If it's an easy fix to an obscure API, we probably
won't even hear about it.

> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

It's uncomfortable to have four active release branches, with 3.1
coming in early 2018. We all benefit from the shared deployment
experiences that harden these releases, and fragmentation creates
incentives to compete for that attention. Rather than tacitly
scuffling over waning interest in the 2.x series, I'd endorse your
other thread encouraging consolidation around 3.x.

To that end, there is no policy or precedent that requires that new
minor releases be labeled as "alpha". If there is cause to believe
that 2.9.0 is not ready to release in the stable line, then we
shouldn't release it. -C

>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>
>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>> issue was difficult to track down as it only happens when you use IP for ZK
>> (works fine with host names) and moreover if ZK and RM are co-located on
>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>
>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>> replicate with RC1 is much lower.
>>
>> -Subru/Arun.
>>
>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>> wrote:
>>
>>> +1 from me too.
>>>
>>> Did the following:
>>> 1) set up a 9-node cluster;
>>> 2) ran some Gridmix jobs;
>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>> guaranteed and opportunistic containers for each job);
>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>> containers.
>>>
>>> All the above worked with no issues.
>>>
>>> Thanks for all the effort guys!
>>>
>>> Konstantinos
>>>
>>>
>>>
>>> Konstantinos
>>>
>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>> wrote:
>>>
>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>
>>>> - Verified all hashes and checksums
>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>> - Deployed a pseudo cluster
>>>> - Ran some example jobs
>>>>
>>>> Thanks,
>>>>
>>>> Eric
>>>>
>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>
>>>>> Sunil / Rohith,
>>>>>
>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>> focusedCommentId=16242693&
>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>> comment-tabpanel#comment-16242693
>>>>>
>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>> issue?
>>>>>
>>>>> Thanks,
>>>>> Wangda
>>>>>
>>>>>
>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>> wrote:
>>>>>
>>>>>> Thanks for testing Rohith and Sunil
>>>>>>
>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>> cluster
>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>> I've
>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>> jira/browse/YARN-7453
>>>>>
>>>>>> JIRA
>>>>>> with details of testing.
>>>>>>
>>>>>> Cheers
>>>>>> -Arun/Subru
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>> rohithsharmaks@apache.org
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>> this
>>>>>>> issue.
>>>>>>>
>>>>>>> - Rohith Sharma K S
>>>>>>>
>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Subru and Arun.
>>>>>>>>
>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>
>>>>>>>> I installed cluster built from source.
>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>
>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>> - Started an HA cluster
>>>>>>>> - Pushed RM to standby
>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorServic
>>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>>> orService.java:146)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894
>>>>>>>>    )
>>>>>>>>
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>        at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:
>>>>>>>> 949)
>>>>>>>>
>>>>>>>> Will check and post more details,
>>>>>>>>
>>>>>>>> - Sunil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>
>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>> cluster
>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>
>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>> start.
>>>>>>>>> *Can
>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>
>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>> switch
>>>>>> to
>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>
>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>> ActiveStandbyElector:
>>>>>>>>> Exception handling the winning of election
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>> ClientCnxn.
>>>>>>>> java:498)
>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>> when
>>>>>>>>> transitioning to Active mode
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>    ... 4 more
>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>> KeeperErrorCode =
>>>>>>>>> NoAuth
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>> iceStateException.java:105)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:205)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>> upInformation.java:1886)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>    ... 5 more
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>> NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>    at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:949)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>    at org.apache.curator.RetryLoop.
>>> callWithRetry(RetryLoop.java:
>>>>> 107)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:194)
>>>>>>>>>    ... 13 more
>>>>>>>>>
>>>>>>>>> Thanks & Regards
>>>>>>>>> Rohith Sharma K S
>>>>>>>>>
>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>> 2.9
>>>>>>>> line
>>>>>>>>> and
>>>>>>>>>> will be the latest stable/production release for Apache
>>> Hadoop -
>>>>> it
>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>> 787
>>>>>> Bug
>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>
>>>>>>>>>>      More information about the 2.9.0 release plan can be
>>> found
>>>>>> here:
>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>
>>>>>>>>>>      New RC is available at:
>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>
>>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>> commit
>>>>>>>> id
>>>>>>>>> is:
>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>
>>>>>>>>>>      The maven artifacts are available via
>>>> repository.apache.org
>>>>>> at:
>>>>>>>>>> *
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>> <
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>      Please try the release and vote; the vote will run for
>>> the
>>>>>>>> usual 5
>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Arun/Subru
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Chris Douglas <cd...@apache.org>.
The labor required for these release formalisms is exceeding their
value. Our minor releases have more bugs than our patch releases (we
hope), but every consumer should understand how software versioning
works. Every device I own has bugs on major OS updates. That doesn't
imply that every minor release is strictly less stable than a patch
release, and users need to be warned off it.

In contrast, we should warn users about features that compromise
invariants like security or durability, either by design or due to
their early stage of development. We can't reasonably expect them to
understand those tradeoffs, since they depend on internal details of
Hadoop.

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
<vi...@apache.org> wrote:
> When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility?

Every project takes these case-by-case. Most of the time we'll
accommodate the old semantics- and we try to be explicit where we
promise compatibility- but this isn't a logic problem, it's a
practical one. If it's an easy fix to an obscure API, we probably
won't even hear about it.

> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

It's uncomfortable to have four active release branches, with 3.1
coming in early 2018. We all benefit from the shared deployment
experiences that harden these releases, and fragmentation creates
incentives to compete for that attention. Rather than tacitly
scuffling over waning interest in the 2.x series, I'd endorse your
other thread encouraging consolidation around 3.x.

To that end, there is no policy or precedent that requires that new
minor releases be labeled as "alpha". If there is cause to believe
that 2.9.0 is not ready to release in the stable line, then we
shouldn't release it. -C

>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
>>
>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>> issue was difficult to track down as it only happens when you use IP for ZK
>> (works fine with host names) and moreover if ZK and RM are co-located on
>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>
>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>> replicate with RC1 is much lower.
>>
>> -Subru/Arun.
>>
>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>> wrote:
>>
>>> +1 from me too.
>>>
>>> Did the following:
>>> 1) set up a 9-node cluster;
>>> 2) ran some Gridmix jobs;
>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>> guaranteed and opportunistic containers for each job);
>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>> containers.
>>>
>>> All the above worked with no issues.
>>>
>>> Thanks for all the effort guys!
>>>
>>> Konstantinos
>>>
>>>
>>>
>>> Konstantinos
>>>
>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>>> wrote:
>>>
>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>
>>>> - Verified all hashes and checksums
>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>> - Deployed a pseudo cluster
>>>> - Ran some example jobs
>>>>
>>>> Thanks,
>>>>
>>>> Eric
>>>>
>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>>>
>>>>> Sunil / Rohith,
>>>>>
>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>> focusedCommentId=16242693&
>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>> comment-tabpanel#comment-16242693
>>>>>
>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>> issue?
>>>>>
>>>>> Thanks,
>>>>> Wangda
>>>>>
>>>>>
>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>>> wrote:
>>>>>
>>>>>> Thanks for testing Rohith and Sunil
>>>>>>
>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>> cluster
>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>> I've
>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>> jira/browse/YARN-7453
>>>>>
>>>>>> JIRA
>>>>>> with details of testing.
>>>>>>
>>>>>> Cheers
>>>>>> -Arun/Subru
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>> rohithsharmaks@apache.org
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>>> this
>>>>>>> issue.
>>>>>>>
>>>>>>> - Rohith Sharma K S
>>>>>>>
>>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Subru and Arun.
>>>>>>>>
>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>
>>>>>>>> I installed cluster built from source.
>>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>
>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>> - Started an HA cluster
>>>>>>>> - Pushed RM to standby
>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorServic
>>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>>> orService.java:146)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894
>>>>>>>>    )
>>>>>>>>
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>        at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:
>>>>>>>> 949)
>>>>>>>>
>>>>>>>> Will check and post more details,
>>>>>>>>
>>>>>>>> - Sunil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>
>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>> cluster
>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>
>>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>>> start.
>>>>>>>>> *Can
>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>
>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>>> switch
>>>>>> to
>>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>>>
>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>> ActiveStandbyElector:
>>>>>>>>> Exception handling the winning of election
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>> ClientCnxn.
>>>>>>>> java:498)
>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>>> when
>>>>>>>>> transitioning to Active mode
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>    ... 4 more
>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>> KeeperErrorCode =
>>>>>>>>> NoAuth
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>> iceStateException.java:105)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:205)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>> upInformation.java:1886)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>    ... 5 more
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>> NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>    at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:949)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>    at org.apache.curator.RetryLoop.
>>> callWithRetry(RetryLoop.java:
>>>>> 107)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:194)
>>>>>>>>>    ... 13 more
>>>>>>>>>
>>>>>>>>> Thanks & Regards
>>>>>>>>> Rohith Sharma K S
>>>>>>>>>
>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>>> 2.9
>>>>>>>> line
>>>>>>>>> and
>>>>>>>>>> will be the latest stable/production release for Apache
>>> Hadoop -
>>>>> it
>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>>> 787
>>>>>> Bug
>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>
>>>>>>>>>>      More information about the 2.9.0 release plan can be
>>> found
>>>>>> here:
>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>
>>>>>>>>>>      New RC is available at:
>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>
>>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>>> commit
>>>>>>>> id
>>>>>>>>> is:
>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>
>>>>>>>>>>      The maven artifacts are available via
>>>> repository.apache.org
>>>>>> at:
>>>>>>>>>> *
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>> <
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>      Please try the release and vote; the vote will run for
>>> the
>>>>>>>> usual 5
>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Arun/Subru
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
A related point - I thought I mentioned this in one of the release preparation threads, but in any case.

Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to the voting thread as well as the final release) that the first release can potentially go through additional fixes to incompatible changes (besides stabilization fixes). We should do this with 2.9.0 too.

This has some history - long before this, we tried two different things: (a) downstream projects consume an RC (b) downstream projects consume a release. Option (a) was tried many times but it was increasingly getting hard to manage this across all the projects that depend on Hadoop. When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility? So to avoid this problem, we've started marking the first few releases as alpha eventually making a stable point release. Clearly, specific users can still use this in production as long as we the Hadoop community reserve the right to fix incompatibilities.

Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

Thanks
+Vinod

> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> 
> We are canceling the RC due to the issue that Rohith/Sunil identified. The
> issue was difficult to track down as it only happens when you use IP for ZK
> (works fine with host names) and moreover if ZK and RM are co-located on
> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> 
> Thanks to everyone for the extensive testing/validation. Hopefully cost to
> replicate with RC1 is much lower.
> 
> -Subru/Arun.
> 
> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>> wrote:
> 
>> +1 from me too.
>> 
>> Did the following:
>> 1) set up a 9-node cluster;
>> 2) ran some Gridmix jobs;
>> 3) ran (2) after enabling opportunistic containers (used a mix of
>> guaranteed and opportunistic containers for each job);
>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>> containers.
>> 
>> All the above worked with no issues.
>> 
>> Thanks for all the effort guys!
>> 
>> Konstantinos
>> 
>> 
>> 
>> Konstantinos
>> 
>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>> wrote:
>> 
>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>> 
>>> - Verified all hashes and checksums
>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>> - Deployed a pseudo cluster
>>> - Ran some example jobs
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>> 
>>>> Sunil / Rohith,
>>>> 
>>>> Could you check if your configs are same as Jonathan posted configs?
>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>> focusedCommentId=16242693&
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>> comment-tabpanel#comment-16242693
>>>> 
>>>> And could you try if using Jonathan's configs can still reproduce the
>>>> issue?
>>>> 
>>>> Thanks,
>>>> Wangda
>>>> 
>>>> 
>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>> wrote:
>>>> 
>>>>> Thanks for testing Rohith and Sunil
>>>>> 
>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>> cluster
>>>>> (both automatic and manual) and we are not able to reproduce this.
>> I've
>>>>> updated the YARN-7453 <https://issues.apache.org/
>> jira/browse/YARN-7453
>>>> 
>>>>> JIRA
>>>>> with details of testing.
>>>>> 
>>>>> Cheers
>>>>> -Arun/Subru
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>> rohithsharmaks@apache.org
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>> this
>>>>>> issue.
>>>>>> 
>>>>>> - Rohith Sharma K S
>>>>>> 
>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>> 
>>>>>>> Hi Subru and Arun.
>>>>>>> 
>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>> 
>>>>>>> I installed cluster built from source.
>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>> 
>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>> - Started an HA cluster
>>>>>>> - Pushed RM to standby
>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>> 
>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>> transition
>>>> to
>>>>>>> Active
>>>>>>>        at
>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorServic
>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>> orService.java:146)
>>>>>>>        at
>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894
>>>>>>>    )
>>>>>>> 
>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>        at
>>>>>>> org.apache.zookeeper.KeeperException.create(
>>> KeeperException.java:113)
>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>> ZooKeeper.java:
>>>>>>> 949)
>>>>>>> 
>>>>>>> Will check and post more details,
>>>>>>> 
>>>>>>> - Sunil
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>> 
>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>> cluster
>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>> 
>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>> start.
>>>>>>>> *Can
>>>>>>>> anyone else is facing this issue?*
>>>>>>>> 
>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>> switch
>>>>> to
>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>> 
>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>> ActiveStandbyElector:
>>>>>>>> Exception handling the winning of election
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:146)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>> veStandbyElector.java:473)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>> ClientCnxn.java:599)
>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>> ClientCnxn.
>>>>>>> java:498)
>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>> when
>>>>>>>> transitioning to Active mode
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:144)
>>>>>>>>    ... 4 more
>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>> KeeperErrorCode =
>>>>>>>> NoAuth
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>> iceStateException.java:105)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:205)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>> upInformation.java:1886)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>    ... 5 more
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>> NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>    at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:949)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>    at org.apache.curator.RetryLoop.
>> callWithRetry(RetryLoop.java:
>>>> 107)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:194)
>>>>>>>>    ... 13 more
>>>>>>>> 
>>>>>>>> Thanks & Regards
>>>>>>>> Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>> 2.9
>>>>>>> line
>>>>>>>> and
>>>>>>>>> will be the latest stable/production release for Apache
>> Hadoop -
>>>> it
>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>> 787
>>>>> Bug
>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>> 
>>>>>>>>>      More information about the 2.9.0 release plan can be
>> found
>>>>> here:
>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>> 
>>>>>>>>>      New RC is available at:
>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>> 
>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>> commit
>>>>>>> id
>>>>>>>> is:
>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>> 
>>>>>>>>>      The maven artifacts are available via
>>> repository.apache.org
>>>>> at:
>>>>>>>>> *
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>> <
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>>      Please try the release and vote; the vote will run for
>> the
>>>>>>> usual 5
>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Arun/Subru
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
A related point - I thought I mentioned this in one of the release preparation threads, but in any case.

Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to the voting thread as well as the final release) that the first release can potentially go through additional fixes to incompatible changes (besides stabilization fixes). We should do this with 2.9.0 too.

This has some history - long before this, we tried two different things: (a) downstream projects consume an RC (b) downstream projects consume a release. Option (a) was tried many times but it was increasingly getting hard to manage this across all the projects that depend on Hadoop. When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility? So to avoid this problem, we've started marking the first few releases as alpha eventually making a stable point release. Clearly, specific users can still use this in production as long as we the Hadoop community reserve the right to fix incompatibilities.

Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

Thanks
+Vinod

> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> 
> We are canceling the RC due to the issue that Rohith/Sunil identified. The
> issue was difficult to track down as it only happens when you use IP for ZK
> (works fine with host names) and moreover if ZK and RM are co-located on
> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> 
> Thanks to everyone for the extensive testing/validation. Hopefully cost to
> replicate with RC1 is much lower.
> 
> -Subru/Arun.
> 
> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>> wrote:
> 
>> +1 from me too.
>> 
>> Did the following:
>> 1) set up a 9-node cluster;
>> 2) ran some Gridmix jobs;
>> 3) ran (2) after enabling opportunistic containers (used a mix of
>> guaranteed and opportunistic containers for each job);
>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>> containers.
>> 
>> All the above worked with no issues.
>> 
>> Thanks for all the effort guys!
>> 
>> Konstantinos
>> 
>> 
>> 
>> Konstantinos
>> 
>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>> wrote:
>> 
>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>> 
>>> - Verified all hashes and checksums
>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>> - Deployed a pseudo cluster
>>> - Ran some example jobs
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>> 
>>>> Sunil / Rohith,
>>>> 
>>>> Could you check if your configs are same as Jonathan posted configs?
>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>> focusedCommentId=16242693&
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>> comment-tabpanel#comment-16242693
>>>> 
>>>> And could you try if using Jonathan's configs can still reproduce the
>>>> issue?
>>>> 
>>>> Thanks,
>>>> Wangda
>>>> 
>>>> 
>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>> wrote:
>>>> 
>>>>> Thanks for testing Rohith and Sunil
>>>>> 
>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>> cluster
>>>>> (both automatic and manual) and we are not able to reproduce this.
>> I've
>>>>> updated the YARN-7453 <https://issues.apache.org/
>> jira/browse/YARN-7453
>>>> 
>>>>> JIRA
>>>>> with details of testing.
>>>>> 
>>>>> Cheers
>>>>> -Arun/Subru
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>> rohithsharmaks@apache.org
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>> this
>>>>>> issue.
>>>>>> 
>>>>>> - Rohith Sharma K S
>>>>>> 
>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>> 
>>>>>>> Hi Subru and Arun.
>>>>>>> 
>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>> 
>>>>>>> I installed cluster built from source.
>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>> 
>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>> - Started an HA cluster
>>>>>>> - Pushed RM to standby
>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>> 
>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>> transition
>>>> to
>>>>>>> Active
>>>>>>>        at
>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorServic
>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>> orService.java:146)
>>>>>>>        at
>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894
>>>>>>>    )
>>>>>>> 
>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>        at
>>>>>>> org.apache.zookeeper.KeeperException.create(
>>> KeeperException.java:113)
>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>> ZooKeeper.java:
>>>>>>> 949)
>>>>>>> 
>>>>>>> Will check and post more details,
>>>>>>> 
>>>>>>> - Sunil
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>> 
>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>> cluster
>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>> 
>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>> start.
>>>>>>>> *Can
>>>>>>>> anyone else is facing this issue?*
>>>>>>>> 
>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>> switch
>>>>> to
>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>> 
>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>> ActiveStandbyElector:
>>>>>>>> Exception handling the winning of election
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:146)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>> veStandbyElector.java:473)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>> ClientCnxn.java:599)
>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>> ClientCnxn.
>>>>>>> java:498)
>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>> when
>>>>>>>> transitioning to Active mode
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:144)
>>>>>>>>    ... 4 more
>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>> KeeperErrorCode =
>>>>>>>> NoAuth
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>> iceStateException.java:105)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:205)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>> upInformation.java:1886)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>    ... 5 more
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>> NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>    at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:949)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>    at org.apache.curator.RetryLoop.
>> callWithRetry(RetryLoop.java:
>>>> 107)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:194)
>>>>>>>>    ... 13 more
>>>>>>>> 
>>>>>>>> Thanks & Regards
>>>>>>>> Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>> 2.9
>>>>>>> line
>>>>>>>> and
>>>>>>>>> will be the latest stable/production release for Apache
>> Hadoop -
>>>> it
>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>> 787
>>>>> Bug
>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>> 
>>>>>>>>>      More information about the 2.9.0 release plan can be
>> found
>>>>> here:
>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>> 
>>>>>>>>>      New RC is available at:
>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>> 
>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>> commit
>>>>>>> id
>>>>>>>> is:
>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>> 
>>>>>>>>>      The maven artifacts are available via
>>> repository.apache.org
>>>>> at:
>>>>>>>>> *
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>> <
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>>      Please try the release and vote; the vote will run for
>> the
>>>>>>> usual 5
>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Arun/Subru
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
A related point - I thought I mentioned this in one of the release preparation threads, but in any case.

Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to the voting thread as well as the final release) that the first release can potentially go through additional fixes to incompatible changes (besides stabilization fixes). We should do this with 2.9.0 too.

This has some history - long before this, we tried two different things: (a) downstream projects consume an RC (b) downstream projects consume a release. Option (a) was tried many times but it was increasingly getting hard to manage this across all the projects that depend on Hadoop. When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility? So to avoid this problem, we've started marking the first few releases as alpha eventually making a stable point release. Clearly, specific users can still use this in production as long as we the Hadoop community reserve the right to fix incompatibilities.

Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

Thanks
+Vinod

> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> 
> We are canceling the RC due to the issue that Rohith/Sunil identified. The
> issue was difficult to track down as it only happens when you use IP for ZK
> (works fine with host names) and moreover if ZK and RM are co-located on
> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> 
> Thanks to everyone for the extensive testing/validation. Hopefully cost to
> replicate with RC1 is much lower.
> 
> -Subru/Arun.
> 
> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>> wrote:
> 
>> +1 from me too.
>> 
>> Did the following:
>> 1) set up a 9-node cluster;
>> 2) ran some Gridmix jobs;
>> 3) ran (2) after enabling opportunistic containers (used a mix of
>> guaranteed and opportunistic containers for each job);
>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>> containers.
>> 
>> All the above worked with no issues.
>> 
>> Thanks for all the effort guys!
>> 
>> Konstantinos
>> 
>> 
>> 
>> Konstantinos
>> 
>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>> wrote:
>> 
>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>> 
>>> - Verified all hashes and checksums
>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>> - Deployed a pseudo cluster
>>> - Ran some example jobs
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>> 
>>>> Sunil / Rohith,
>>>> 
>>>> Could you check if your configs are same as Jonathan posted configs?
>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>> focusedCommentId=16242693&
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>> comment-tabpanel#comment-16242693
>>>> 
>>>> And could you try if using Jonathan's configs can still reproduce the
>>>> issue?
>>>> 
>>>> Thanks,
>>>> Wangda
>>>> 
>>>> 
>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>> wrote:
>>>> 
>>>>> Thanks for testing Rohith and Sunil
>>>>> 
>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>> cluster
>>>>> (both automatic and manual) and we are not able to reproduce this.
>> I've
>>>>> updated the YARN-7453 <https://issues.apache.org/
>> jira/browse/YARN-7453
>>>> 
>>>>> JIRA
>>>>> with details of testing.
>>>>> 
>>>>> Cheers
>>>>> -Arun/Subru
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>> rohithsharmaks@apache.org
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>> this
>>>>>> issue.
>>>>>> 
>>>>>> - Rohith Sharma K S
>>>>>> 
>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>> 
>>>>>>> Hi Subru and Arun.
>>>>>>> 
>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>> 
>>>>>>> I installed cluster built from source.
>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>> 
>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>> - Started an HA cluster
>>>>>>> - Pushed RM to standby
>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>> 
>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>> transition
>>>> to
>>>>>>> Active
>>>>>>>        at
>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorServic
>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>> orService.java:146)
>>>>>>>        at
>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894
>>>>>>>    )
>>>>>>> 
>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>        at
>>>>>>> org.apache.zookeeper.KeeperException.create(
>>> KeeperException.java:113)
>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>> ZooKeeper.java:
>>>>>>> 949)
>>>>>>> 
>>>>>>> Will check and post more details,
>>>>>>> 
>>>>>>> - Sunil
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>> 
>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>> cluster
>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>> 
>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>> start.
>>>>>>>> *Can
>>>>>>>> anyone else is facing this issue?*
>>>>>>>> 
>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>> switch
>>>>> to
>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>> 
>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>> ActiveStandbyElector:
>>>>>>>> Exception handling the winning of election
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:146)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>> veStandbyElector.java:473)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>> ClientCnxn.java:599)
>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>> ClientCnxn.
>>>>>>> java:498)
>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>> when
>>>>>>>> transitioning to Active mode
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:144)
>>>>>>>>    ... 4 more
>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>> KeeperErrorCode =
>>>>>>>> NoAuth
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>> iceStateException.java:105)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:205)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>> upInformation.java:1886)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>    ... 5 more
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>> NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>    at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:949)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>    at org.apache.curator.RetryLoop.
>> callWithRetry(RetryLoop.java:
>>>> 107)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:194)
>>>>>>>>    ... 13 more
>>>>>>>> 
>>>>>>>> Thanks & Regards
>>>>>>>> Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>> 2.9
>>>>>>> line
>>>>>>>> and
>>>>>>>>> will be the latest stable/production release for Apache
>> Hadoop -
>>>> it
>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>> 787
>>>>> Bug
>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>> 
>>>>>>>>>      More information about the 2.9.0 release plan can be
>> found
>>>>> here:
>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>> 
>>>>>>>>>      New RC is available at:
>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>> 
>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>> commit
>>>>>>> id
>>>>>>>> is:
>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>> 
>>>>>>>>>      The maven artifacts are available via
>>> repository.apache.org
>>>>> at:
>>>>>>>>> *
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>> <
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>>      Please try the release and vote; the vote will run for
>> the
>>>>>>> usual 5
>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Arun/Subru
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
A related point - I thought I mentioned this in one of the release preparation threads, but in any case.

Starting 2.7.0, for every .0 release, we've been adding a disclaimer (to the voting thread as well as the final release) that the first release can potentially go through additional fixes to incompatible changes (besides stabilization fixes). We should do this with 2.9.0 too.

This has some history - long before this, we tried two different things: (a) downstream projects consume an RC (b) downstream projects consume a release. Option (a) was tried many times but it was increasingly getting hard to manage this across all the projects that depend on Hadoop. When we tried option (b), we used to make .0 as a GA release, but downstream projects like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced into a conundrum - is fixing this incompatible change itself an incompatibility? So to avoid this problem, we've started marking the first few releases as alpha eventually making a stable point release. Clearly, specific users can still use this in production as long as we the Hadoop community reserve the right to fix incompatibilities.

Long story short, I'd just add to your voting thread and release notes that 2.9.0 still needs to be tested downstream and so users may want to wait for subsequent point releases.

Thanks
+Vinod

> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <su...@apache.org> wrote:
> 
> We are canceling the RC due to the issue that Rohith/Sunil identified. The
> issue was difficult to track down as it only happens when you use IP for ZK
> (works fine with host names) and moreover if ZK and RM are co-located on
> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
> 
> Thanks to everyone for the extensive testing/validation. Hopefully cost to
> replicate with RC1 is much lower.
> 
> -Subru/Arun.
> 
> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>> wrote:
> 
>> +1 from me too.
>> 
>> Did the following:
>> 1) set up a 9-node cluster;
>> 2) ran some Gridmix jobs;
>> 3) ran (2) after enabling opportunistic containers (used a mix of
>> guaranteed and opportunistic containers for each job);
>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>> containers.
>> 
>> All the above worked with no issues.
>> 
>> Thanks for all the effort guys!
>> 
>> Konstantinos
>> 
>> 
>> 
>> Konstantinos
>> 
>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
>> wrote:
>> 
>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>> 
>>> - Verified all hashes and checksums
>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>> - Deployed a pseudo cluster
>>> - Ran some example jobs
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>>> 
>>>> Sunil / Rohith,
>>>> 
>>>> Could you check if your configs are same as Jonathan posted configs?
>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>> focusedCommentId=16242693&
>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>> comment-tabpanel#comment-16242693
>>>> 
>>>> And could you try if using Jonathan's configs can still reproduce the
>>>> issue?
>>>> 
>>>> Thanks,
>>>> Wangda
>>>> 
>>>> 
>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
>> wrote:
>>>> 
>>>>> Thanks for testing Rohith and Sunil
>>>>> 
>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>> cluster
>>>>> (both automatic and manual) and we are not able to reproduce this.
>> I've
>>>>> updated the YARN-7453 <https://issues.apache.org/
>> jira/browse/YARN-7453
>>>> 
>>>>> JIRA
>>>>> with details of testing.
>>>>> 
>>>>> Cheers
>>>>> -Arun/Subru
>>>>> 
>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>> rohithsharmaks@apache.org
>>>>>> wrote:
>>>>> 
>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
>> this
>>>>>> issue.
>>>>>> 
>>>>>> - Rohith Sharma K S
>>>>>> 
>>>>>> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>>>>>> 
>>>>>>> Hi Subru and Arun.
>>>>>>> 
>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>> 
>>>>>>> I installed cluster built from source.
>>>>>>> - Ran few MR jobs with application priority enabled. Runs fine.
>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>> 
>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>> - Started an HA cluster
>>>>>>> - Pushed RM to standby
>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>> 
>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>> transition
>>>> to
>>>>>>> Active
>>>>>>>        at
>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorServic
>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>> orService.java:146)
>>>>>>>        at
>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894
>>>>>>>    )
>>>>>>> 
>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>        at
>>>>>>> org.apache.zookeeper.KeeperException.create(
>>> KeeperException.java:113)
>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>> ZooKeeper.java:
>>>>>>> 949)
>>>>>>> 
>>>>>>> Will check and post more details,
>>>>>>> 
>>>>>>> - Sunil
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>> rohithsharmaks@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>> 
>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>> cluster
>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>> 
>>>>>>>> I am facing basic RM HA switch issue after first time successful
>>>>> start.
>>>>>>>> *Can
>>>>>>>> anyone else is facing this issue?*
>>>>>>>> 
>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>>>> switch
>>>>> to
>>>>>>>> active successfully. Exception trace I see from the log is
>>>>>>>> 
>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>> ActiveStandbyElector:
>>>>>>>> Exception handling the winning of election
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:146)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>> eStandbyElector.java:894)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>> veStandbyElector.java:473)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>> ClientCnxn.java:599)
>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>> ClientCnxn.
>>>>>>> java:498)
>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
>>> when
>>>>>>>> transitioning to Active mode
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>> torBasedElectorService.java:144)
>>>>>>>>    ... 4 more
>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>> KeeperErrorCode =
>>>>>>>> NoAuth
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>> iceStateException.java:105)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:205)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>> Method)
>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>> upInformation.java:1886)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>    ... 5 more
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>> NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>    at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:949)
>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>    at org.apache.curator.RetryLoop.
>> callWithRetry(RetryLoop.java:
>>>> 107)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>    at
>>>>>>>> 
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>    at
>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>> ice.java:194)
>>>>>>>>    ... 13 more
>>>>>>>> 
>>>>>>>> Thanks & Regards
>>>>>>>> Rohith Sharma K S
>>>>>>>> 
>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop
>>> 2.9
>>>>>>> line
>>>>>>>> and
>>>>>>>>> will be the latest stable/production release for Apache
>> Hadoop -
>>>> it
>>>>>>>>> includes 30 New Features with 500+ subtasks, 407 Improvements,
>>> 787
>>>>> Bug
>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>> 
>>>>>>>>>      More information about the 2.9.0 release plan can be
>> found
>>>>> here:
>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>> 
>>>>>>>>>      New RC is available at:
>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>> 
>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and the latest
>>>> commit
>>>>>>> id
>>>>>>>> is:
>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>> 
>>>>>>>>>      The maven artifacts are available via
>>> repository.apache.org
>>>>> at:
>>>>>>>>> *
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>> <
>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>> hadoop-1065/
>>>>>>>>>> *
>>>>>>>>> 
>>>>>>>>>      Please try the release and vote; the vote will run for
>> the
>>>>>>> usual 5
>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Arun/Subru
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
> wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453 <https://issues.apache.org/
> jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharmaks@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >>         at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >>         at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >>     )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >>         at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharmaks@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > > ActiveStandbyElector:
> > > > >> > Exception handling the winning of election
> > > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > > transition
> > > > to
> > > > >> > Active
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:146)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > > >> veStandbyElector.java:473)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > > >> ClientCnxn.java:599)
> > > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.
> > > > >> java:498)
> > > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> > when
> > > > >> > transitioning to Active mode
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:325)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:144)
> > > > >> >     ... 4 more
> > > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > > KeeperErrorCode =
> > > > >> > NoAuth
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > > >> iceStateException.java:105)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:205)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.startActiveServices(ResourceManager.java:1131)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1171)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1167)
> > > > >> >     at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > > >> upInformation.java:1886)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.transitionToActive(ResourceManager.java:1167)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:320)
> > > > >> >     ... 5 more
> > > > >> > Caused by: org.apache.zookeeper.KeeperException$
> NoAuthException:
> > > > >> > KeeperErrorCode = NoAuth
> > > > >> >     at
> > > > >> > org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:113)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > > ZooKeeper.java:949)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > > >> peration(CuratorTransactionImpl.java:159)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > > >> ess$200(CuratorTransactionImpl.java:44)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:129)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:125)
> > > > >> >     at org.apache.curator.RetryLoop.
> callWithRetry(RetryLoop.java:
> > > 107)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > > >> mit(CuratorTransactionImpl.java:122)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > > >> ion.commit(ZKCuratorManager.java:403)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > > >> ZKCuratorManager.java:372)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:194)
> > > > >> >     ... 13 more
> > > > >> >
> > > > >> > Thanks & Regards
> > > > >> > Rohith Sharma K S
> > > > >> >
> > > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> > 2.9
> > > > >> line
> > > > >> > and
> > > > >> > > will be the latest stable/production release for Apache
> Hadoop -
> > > it
> > > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> > 787
> > > > Bug
> > > > >> > > fixes new fixed issues since 2.8.2 .
> > > > >> > >
> > > > >> > >       More information about the 2.9.0 release plan can be
> found
> > > > here:
> > > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9
> > > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9>*
> > > > >> > >
> > > > >> > >       New RC is available at:
> > > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > > >> > >
> > > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > > commit
> > > > >> id
> > > > >> > is:
> > > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > > >> > >
> > > > >> > >       The maven artifacts are available via
> > repository.apache.org
> > > > at:
> > > > >> > > *
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > <
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > >*
> > > > >> > >
> > > > >> > >       Please try the release and vote; the vote will run for
> the
> > > > >> usual 5
> > > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Arun/Subru
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
> wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453 <https://issues.apache.org/
> jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharmaks@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >>         at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >>         at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >>     )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >>         at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharmaks@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > > ActiveStandbyElector:
> > > > >> > Exception handling the winning of election
> > > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > > transition
> > > > to
> > > > >> > Active
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:146)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > > >> veStandbyElector.java:473)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > > >> ClientCnxn.java:599)
> > > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.
> > > > >> java:498)
> > > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> > when
> > > > >> > transitioning to Active mode
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:325)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:144)
> > > > >> >     ... 4 more
> > > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > > KeeperErrorCode =
> > > > >> > NoAuth
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > > >> iceStateException.java:105)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:205)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.startActiveServices(ResourceManager.java:1131)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1171)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1167)
> > > > >> >     at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > > >> upInformation.java:1886)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.transitionToActive(ResourceManager.java:1167)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:320)
> > > > >> >     ... 5 more
> > > > >> > Caused by: org.apache.zookeeper.KeeperException$
> NoAuthException:
> > > > >> > KeeperErrorCode = NoAuth
> > > > >> >     at
> > > > >> > org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:113)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > > ZooKeeper.java:949)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > > >> peration(CuratorTransactionImpl.java:159)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > > >> ess$200(CuratorTransactionImpl.java:44)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:129)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:125)
> > > > >> >     at org.apache.curator.RetryLoop.
> callWithRetry(RetryLoop.java:
> > > 107)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > > >> mit(CuratorTransactionImpl.java:122)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > > >> ion.commit(ZKCuratorManager.java:403)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > > >> ZKCuratorManager.java:372)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:194)
> > > > >> >     ... 13 more
> > > > >> >
> > > > >> > Thanks & Regards
> > > > >> > Rohith Sharma K S
> > > > >> >
> > > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> > 2.9
> > > > >> line
> > > > >> > and
> > > > >> > > will be the latest stable/production release for Apache
> Hadoop -
> > > it
> > > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> > 787
> > > > Bug
> > > > >> > > fixes new fixed issues since 2.8.2 .
> > > > >> > >
> > > > >> > >       More information about the 2.9.0 release plan can be
> found
> > > > here:
> > > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9
> > > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9>*
> > > > >> > >
> > > > >> > >       New RC is available at:
> > > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > > >> > >
> > > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > > commit
> > > > >> id
> > > > >> > is:
> > > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > > >> > >
> > > > >> > >       The maven artifacts are available via
> > repository.apache.org
> > > > at:
> > > > >> > > *
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > <
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > >*
> > > > >> > >
> > > > >> > >       Please try the release and vote; the vote will run for
> the
> > > > >> usual 5
> > > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Arun/Subru
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
> wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453 <https://issues.apache.org/
> jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharmaks@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >>         at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >>         at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >>     )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >>         at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharmaks@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > > ActiveStandbyElector:
> > > > >> > Exception handling the winning of election
> > > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > > transition
> > > > to
> > > > >> > Active
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:146)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > > >> veStandbyElector.java:473)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > > >> ClientCnxn.java:599)
> > > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.
> > > > >> java:498)
> > > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> > when
> > > > >> > transitioning to Active mode
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:325)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:144)
> > > > >> >     ... 4 more
> > > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > > KeeperErrorCode =
> > > > >> > NoAuth
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > > >> iceStateException.java:105)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:205)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.startActiveServices(ResourceManager.java:1131)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1171)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1167)
> > > > >> >     at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > > >> upInformation.java:1886)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.transitionToActive(ResourceManager.java:1167)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:320)
> > > > >> >     ... 5 more
> > > > >> > Caused by: org.apache.zookeeper.KeeperException$
> NoAuthException:
> > > > >> > KeeperErrorCode = NoAuth
> > > > >> >     at
> > > > >> > org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:113)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > > ZooKeeper.java:949)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > > >> peration(CuratorTransactionImpl.java:159)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > > >> ess$200(CuratorTransactionImpl.java:44)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:129)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:125)
> > > > >> >     at org.apache.curator.RetryLoop.
> callWithRetry(RetryLoop.java:
> > > 107)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > > >> mit(CuratorTransactionImpl.java:122)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > > >> ion.commit(ZKCuratorManager.java:403)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > > >> ZKCuratorManager.java:372)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:194)
> > > > >> >     ... 13 more
> > > > >> >
> > > > >> > Thanks & Regards
> > > > >> > Rohith Sharma K S
> > > > >> >
> > > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> > 2.9
> > > > >> line
> > > > >> > and
> > > > >> > > will be the latest stable/production release for Apache
> Hadoop -
> > > it
> > > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> > 787
> > > > Bug
> > > > >> > > fixes new fixed issues since 2.8.2 .
> > > > >> > >
> > > > >> > >       More information about the 2.9.0 release plan can be
> found
> > > > here:
> > > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9
> > > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9>*
> > > > >> > >
> > > > >> > >       New RC is available at:
> > > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > > >> > >
> > > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > > commit
> > > > >> id
> > > > >> > is:
> > > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > > >> > >
> > > > >> > >       The maven artifacts are available via
> > repository.apache.org
> > > > at:
> > > > >> > > *
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > <
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > >*
> > > > >> > >
> > > > >> > >       Please try the release and vote; the vote will run for
> the
> > > > >> usual 5
> > > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Arun/Subru
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Subru Krishnan <su...@apache.org>.
We are canceling the RC due to the issue that Rohith/Sunil identified. The
issue was difficult to track down as it only happens when you use IP for ZK
(works fine with host names) and moreover if ZK and RM are co-located on
same machine. We are hopeful to get the fix in tomorrow and roll out RC1.

Thanks to everyone for the extensive testing/validation. Hopefully cost to
replicate with RC1 is much lower.

-Subru/Arun.

On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
> wrote:

> +1 from me too.
>
> Did the following:
> 1) set up a 9-node cluster;
> 2) ran some Gridmix jobs;
> 3) ran (2) after enabling opportunistic containers (used a mix of
> guaranteed and opportunistic containers for each job);
> 4) ran (3) but this time enabling distributed scheduling of opportunistic
> containers.
>
> All the above worked with no issues.
>
> Thanks for all the effort guys!
>
> Konstantinos
>
>
>
> Konstantinos
>
> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
> wrote:
>
> > +1 (non-binding) pending the issue that Sunil/Rohith pointed out
> >
> > - Verified all hashes and checksums
> > - Built from source on macOS 10.12.6, Java 1.8.0u65
> > - Deployed a pseudo cluster
> > - Ran some example jobs
> >
> > Thanks,
> >
> > Eric
> >
> > On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
> >
> > > Sunil / Rohith,
> > >
> > > Could you check if your configs are same as Jonathan posted configs?
> > > https://issues.apache.org/jira/browse/YARN-7453?
> > focusedCommentId=16242693&
> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > > comment-tabpanel#comment-16242693
> > >
> > > And could you try if using Jonathan's configs can still reproduce the
> > > issue?
> > >
> > > Thanks,
> > > Wangda
> > >
> > >
> > > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org>
> wrote:
> > >
> > > > Thanks for testing Rohith and Sunil
> > > >
> > > > Can you please confirm if it is not a config issue at your end ?
> > > > We (both Jonathan and myself) just tried testing this on a fresh
> > cluster
> > > > (both automatic and manual) and we are not able to reproduce this.
> I've
> > > > updated the YARN-7453 <https://issues.apache.org/
> jira/browse/YARN-7453
> > >
> > > > JIRA
> > > > with details of testing.
> > > >
> > > > Cheers
> > > > -Arun/Subru
> > > >
> > > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > > rohithsharmaks@apache.org
> > > > > wrote:
> > > >
> > > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track
> this
> > > > > issue.
> > > > >
> > > > > - Rohith Sharma K S
> > > > >
> > > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > > >
> > > > >> Hi Subru and Arun.
> > > > >>
> > > > >> Thanks for driving 2.9 release. Great work!
> > > > >>
> > > > >> I installed cluster built from source.
> > > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > > >> - Accessed new UI and it also seems fine.
> > > > >>
> > > > >> However I am also getting same issue as Rohith reported.
> > > > >> - Started an HA cluster
> > > > >> - Pushed RM to standby
> > > > >> - Pushed back RM to active then seeing an exception.
> > > > >>
> > > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > > >> Active
> > > > >>         at
> > > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorServic
> > > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> > orService.java:146)
> > > > >>         at
> > > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894
> > > > >>     )
> > > > >>
> > > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > > >> KeeperErrorCode = NoAuth
> > > > >>         at
> > > > >> org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:
> > > > >> 949)
> > > > >>
> > > > >> Will check and post more details,
> > > > >>
> > > > >> - Sunil
> > > > >>
> > > > >>
> > > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > > >> rohithsharmaks@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Thanks Subru/Arun for the great work!
> > > > >> >
> > > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > > cluster
> > > > >> > along with new YARN UI and ATSv2.
> > > > >> >
> > > > >> > I am facing basic RM HA switch issue after first time successful
> > > > start.
> > > > >> > *Can
> > > > >> > anyone else is facing this issue?*
> > > > >> >
> > > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > > switch
> > > > to
> > > > >> > active successfully. Exception trace I see from the log is
> > > > >> >
> > > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > > ActiveStandbyElector:
> > > > >> > Exception handling the winning of election
> > > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > > transition
> > > > to
> > > > >> > Active
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:146)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > > >> eStandbyElector.java:894)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > > >> veStandbyElector.java:473)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > > >> ClientCnxn.java:599)
> > > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.
> > > > >> java:498)
> > > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> > when
> > > > >> > transitioning to Active mode
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:325)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > > >> torBasedElectorService.java:144)
> > > > >> >     ... 4 more
> > > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > > KeeperErrorCode =
> > > > >> > NoAuth
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > > >> iceStateException.java:105)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:205)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.startActiveServices(ResourceManager.java:1131)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1171)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$1.run(ResourceManager.java:1167)
> > > > >> >     at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > > >> upInformation.java:1886)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r.transitionToActive(ResourceManager.java:1167)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > > >> ransitionToActive(AdminService.java:320)
> > > > >> >     ... 5 more
> > > > >> > Caused by: org.apache.zookeeper.KeeperException$
> NoAuthException:
> > > > >> > KeeperErrorCode = NoAuth
> > > > >> >     at
> > > > >> > org.apache.zookeeper.KeeperException.create(
> > > KeeperException.java:113)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > > ZooKeeper.java:949)
> > > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > > >> peration(CuratorTransactionImpl.java:159)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > > >> ess$200(CuratorTransactionImpl.java:44)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:129)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > > >> all(CuratorTransactionImpl.java:125)
> > > > >> >     at org.apache.curator.RetryLoop.
> callWithRetry(RetryLoop.java:
> > > 107)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > > >> mit(CuratorTransactionImpl.java:122)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > > >> ion.commit(ZKCuratorManager.java:403)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > > >> ZKCuratorManager.java:372)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > > >> >     at
> > > > >> >
> > > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > > >> >     at
> > > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > > >> ice.java:194)
> > > > >> >     ... 13 more
> > > > >> >
> > > > >> > Thanks & Regards
> > > > >> > Rohith Sharma K S
> > > > >> >
> > > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > > wrote:
> > > > >> >
> > > > >> > > Hi folks,
> > > > >> > >
> > > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> > 2.9
> > > > >> line
> > > > >> > and
> > > > >> > > will be the latest stable/production release for Apache
> Hadoop -
> > > it
> > > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> > 787
> > > > Bug
> > > > >> > > fixes new fixed issues since 2.8.2 .
> > > > >> > >
> > > > >> > >       More information about the 2.9.0 release plan can be
> found
> > > > here:
> > > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9
> > > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > > >> > > Roadmap#Roadmap-Version2.9>*
> > > > >> > >
> > > > >> > >       New RC is available at:
> > > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > > >> > >
> > > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > > commit
> > > > >> id
> > > > >> > is:
> > > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > > >> > >
> > > > >> > >       The maven artifacts are available via
> > repository.apache.org
> > > > at:
> > > > >> > > *
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > <
> > > > >> > https://repository.apache.org/content/repositories/orgapache
> > > > >> hadoop-1065/
> > > > >> > > >*
> > > > >> > >
> > > > >> > >       Please try the release and vote; the vote will run for
> the
> > > > >> usual 5
> > > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Arun/Subru
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Konstantinos Karanasos <kk...@gmail.com>.
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453
> >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharmaks@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >>         at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >>         at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >>     )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >>         at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharmaks@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> >     at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.
> > > >> java:498)
> > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> when
> > > >> > transitioning to Active mode
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:325)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:144)
> > > >> >     ... 4 more
> > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > KeeperErrorCode =
> > > >> > NoAuth
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > >> iceStateException.java:105)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:205)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.startActiveServices(ResourceManager.java:1131)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1171)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1167)
> > > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > >> upInformation.java:1886)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.transitionToActive(ResourceManager.java:1167)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:320)
> > > >> >     ... 5 more
> > > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> > KeeperErrorCode = NoAuth
> > > >> >     at
> > > >> > org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:949)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > >> peration(CuratorTransactionImpl.java:159)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > >> ess$200(CuratorTransactionImpl.java:44)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:129)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:125)
> > > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> > 107)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > >> mit(CuratorTransactionImpl.java:122)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > >> ion.commit(ZKCuratorManager.java:403)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > >> ZKCuratorManager.java:372)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:194)
> > > >> >     ... 13 more
> > > >> >
> > > >> > Thanks & Regards
> > > >> > Rohith Sharma K S
> > > >> >
> > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > wrote:
> > > >> >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> 2.9
> > > >> line
> > > >> > and
> > > >> > > will be the latest stable/production release for Apache Hadoop -
> > it
> > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> 787
> > > Bug
> > > >> > > fixes new fixed issues since 2.8.2 .
> > > >> > >
> > > >> > >       More information about the 2.9.0 release plan can be found
> > > here:
> > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9
> > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9>*
> > > >> > >
> > > >> > >       New RC is available at:
> > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > >> > >
> > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > commit
> > > >> id
> > > >> > is:
> > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > >> > >
> > > >> > >       The maven artifacts are available via
> repository.apache.org
> > > at:
> > > >> > > *
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > <
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > >*
> > > >> > >
> > > >> > >       Please try the release and vote; the vote will run for the
> > > >> usual 5
> > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Arun/Subru
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Konstantinos Karanasos <kk...@gmail.com>.
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453
> >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharmaks@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >>         at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >>         at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >>     )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >>         at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharmaks@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> >     at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.
> > > >> java:498)
> > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> when
> > > >> > transitioning to Active mode
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:325)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:144)
> > > >> >     ... 4 more
> > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > KeeperErrorCode =
> > > >> > NoAuth
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > >> iceStateException.java:105)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:205)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.startActiveServices(ResourceManager.java:1131)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1171)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1167)
> > > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > >> upInformation.java:1886)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.transitionToActive(ResourceManager.java:1167)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:320)
> > > >> >     ... 5 more
> > > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> > KeeperErrorCode = NoAuth
> > > >> >     at
> > > >> > org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:949)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > >> peration(CuratorTransactionImpl.java:159)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > >> ess$200(CuratorTransactionImpl.java:44)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:129)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:125)
> > > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> > 107)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > >> mit(CuratorTransactionImpl.java:122)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > >> ion.commit(ZKCuratorManager.java:403)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > >> ZKCuratorManager.java:372)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:194)
> > > >> >     ... 13 more
> > > >> >
> > > >> > Thanks & Regards
> > > >> > Rohith Sharma K S
> > > >> >
> > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > wrote:
> > > >> >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> 2.9
> > > >> line
> > > >> > and
> > > >> > > will be the latest stable/production release for Apache Hadoop -
> > it
> > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> 787
> > > Bug
> > > >> > > fixes new fixed issues since 2.8.2 .
> > > >> > >
> > > >> > >       More information about the 2.9.0 release plan can be found
> > > here:
> > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9
> > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9>*
> > > >> > >
> > > >> > >       New RC is available at:
> > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > >> > >
> > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > commit
> > > >> id
> > > >> > is:
> > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > >> > >
> > > >> > >       The maven artifacts are available via
> repository.apache.org
> > > at:
> > > >> > > *
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > <
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > >*
> > > >> > >
> > > >> > >       Please try the release and vote; the vote will run for the
> > > >> usual 5
> > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Arun/Subru
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Jonathan Hung <jy...@gmail.com>.
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com> wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharmaks@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >>         at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >>         at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >>     )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >>         at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >>         at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharmaks@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> >     at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:325)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:144)
>> > >> >     ... 4 more
>> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode =
>> > >> > NoAuth
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> > >> iceStateException.java:105)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:205)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.startActiveServices(ResourceManager.java:1131)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1171)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1167)
>> > >> >     at java.security.AccessController.doPrivileged(Native Method)
>> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> > >> upInformation.java:1886)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.transitionToActive(ResourceManager.java:1167)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:320)
>> > >> >     ... 5 more
>> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> > KeeperErrorCode = NoAuth
>> > >> >     at
>> > >> > org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
>> > ZooKeeper.java:949)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> > >> peration(CuratorTransactionImpl.java:159)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> > >> ess$200(CuratorTransactionImpl.java:44)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:129)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:125)
>> > >> >     at org.apache.curator.RetryLoop.c
>> allWithRetry(RetryLoop.java:107)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> > >> mit(CuratorTransactionImpl.java:122)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> > >> ion.commit(ZKCuratorManager.java:403)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> > >> ZKCuratorManager.java:372)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:194)
>> > >> >     ... 13 more
>> > >> >
>> > >> > Thanks & Regards
>> > >> > Rohith Sharma K S
>> > >> >
>> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>> wrote:
>> > >> >
>> > >> > > Hi folks,
>> > >> > >
>> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
>> 2.9
>> > >> line
>> > >> > and
>> > >> > > will be the latest stable/production release for Apache Hadoop -
>> it
>> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
>> 787
>> > Bug
>> > >> > > fixes new fixed issues since 2.8.2 .
>> > >> > >
>> > >> > >       More information about the 2.9.0 release plan can be found
>> > here:
>> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9
>> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9>*
>> > >> > >
>> > >> > >       New RC is available at:
>> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >> > >
>> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
>> commit
>> > >> id
>> > >> > is:
>> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >> > >
>> > >> > >       The maven artifacts are available via
>> repository.apache.org
>> > at:
>> > >> > > *
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > <
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > >*
>> > >> > >
>> > >> > >       Please try the release and vote; the vote will run for the
>> > >> usual 5
>> > >> > > days, ending on 11/10/2017 4pm PST time.
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Arun/Subru
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Konstantinos Karanasos <kk...@gmail.com>.
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453
> >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharmaks@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >>         at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >>         at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >>     )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >>         at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharmaks@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> >     at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.
> > > >> java:498)
> > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> when
> > > >> > transitioning to Active mode
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:325)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:144)
> > > >> >     ... 4 more
> > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > KeeperErrorCode =
> > > >> > NoAuth
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > >> iceStateException.java:105)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:205)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.startActiveServices(ResourceManager.java:1131)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1171)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1167)
> > > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > >> upInformation.java:1886)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.transitionToActive(ResourceManager.java:1167)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:320)
> > > >> >     ... 5 more
> > > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> > KeeperErrorCode = NoAuth
> > > >> >     at
> > > >> > org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:949)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > >> peration(CuratorTransactionImpl.java:159)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > >> ess$200(CuratorTransactionImpl.java:44)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:129)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:125)
> > > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> > 107)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > >> mit(CuratorTransactionImpl.java:122)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > >> ion.commit(ZKCuratorManager.java:403)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > >> ZKCuratorManager.java:372)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:194)
> > > >> >     ... 13 more
> > > >> >
> > > >> > Thanks & Regards
> > > >> > Rohith Sharma K S
> > > >> >
> > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > wrote:
> > > >> >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> 2.9
> > > >> line
> > > >> > and
> > > >> > > will be the latest stable/production release for Apache Hadoop -
> > it
> > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> 787
> > > Bug
> > > >> > > fixes new fixed issues since 2.8.2 .
> > > >> > >
> > > >> > >       More information about the 2.9.0 release plan can be found
> > > here:
> > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9
> > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9>*
> > > >> > >
> > > >> > >       New RC is available at:
> > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > >> > >
> > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > commit
> > > >> id
> > > >> > is:
> > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > >> > >
> > > >> > >       The maven artifacts are available via
> repository.apache.org
> > > at:
> > > >> > > *
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > <
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > >*
> > > >> > >
> > > >> > >       Please try the release and vote; the vote will run for the
> > > >> usual 5
> > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Arun/Subru
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Jonathan Hung <jy...@gmail.com>.
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com> wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharmaks@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >>         at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >>         at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >>     )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >>         at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >>         at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharmaks@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> >     at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:325)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:144)
>> > >> >     ... 4 more
>> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode =
>> > >> > NoAuth
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> > >> iceStateException.java:105)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:205)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.startActiveServices(ResourceManager.java:1131)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1171)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1167)
>> > >> >     at java.security.AccessController.doPrivileged(Native Method)
>> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> > >> upInformation.java:1886)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.transitionToActive(ResourceManager.java:1167)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:320)
>> > >> >     ... 5 more
>> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> > KeeperErrorCode = NoAuth
>> > >> >     at
>> > >> > org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
>> > ZooKeeper.java:949)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> > >> peration(CuratorTransactionImpl.java:159)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> > >> ess$200(CuratorTransactionImpl.java:44)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:129)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:125)
>> > >> >     at org.apache.curator.RetryLoop.c
>> allWithRetry(RetryLoop.java:107)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> > >> mit(CuratorTransactionImpl.java:122)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> > >> ion.commit(ZKCuratorManager.java:403)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> > >> ZKCuratorManager.java:372)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:194)
>> > >> >     ... 13 more
>> > >> >
>> > >> > Thanks & Regards
>> > >> > Rohith Sharma K S
>> > >> >
>> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>> wrote:
>> > >> >
>> > >> > > Hi folks,
>> > >> > >
>> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
>> 2.9
>> > >> line
>> > >> > and
>> > >> > > will be the latest stable/production release for Apache Hadoop -
>> it
>> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
>> 787
>> > Bug
>> > >> > > fixes new fixed issues since 2.8.2 .
>> > >> > >
>> > >> > >       More information about the 2.9.0 release plan can be found
>> > here:
>> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9
>> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9>*
>> > >> > >
>> > >> > >       New RC is available at:
>> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >> > >
>> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
>> commit
>> > >> id
>> > >> > is:
>> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >> > >
>> > >> > >       The maven artifacts are available via
>> repository.apache.org
>> > at:
>> > >> > > *
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > <
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > >*
>> > >> > >
>> > >> > >       Please try the release and vote; the vote will run for the
>> > >> usual 5
>> > >> > > days, ending on 11/10/2017 4pm PST time.
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Arun/Subru
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Jonathan Hung <jy...@gmail.com>.
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com> wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharmaks@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >>         at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >>         at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >>     )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >>         at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >>         at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharmaks@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> >     at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:325)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:144)
>> > >> >     ... 4 more
>> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode =
>> > >> > NoAuth
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> > >> iceStateException.java:105)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:205)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.startActiveServices(ResourceManager.java:1131)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1171)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1167)
>> > >> >     at java.security.AccessController.doPrivileged(Native Method)
>> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> > >> upInformation.java:1886)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.transitionToActive(ResourceManager.java:1167)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:320)
>> > >> >     ... 5 more
>> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> > KeeperErrorCode = NoAuth
>> > >> >     at
>> > >> > org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
>> > ZooKeeper.java:949)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> > >> peration(CuratorTransactionImpl.java:159)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> > >> ess$200(CuratorTransactionImpl.java:44)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:129)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:125)
>> > >> >     at org.apache.curator.RetryLoop.c
>> allWithRetry(RetryLoop.java:107)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> > >> mit(CuratorTransactionImpl.java:122)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> > >> ion.commit(ZKCuratorManager.java:403)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> > >> ZKCuratorManager.java:372)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:194)
>> > >> >     ... 13 more
>> > >> >
>> > >> > Thanks & Regards
>> > >> > Rohith Sharma K S
>> > >> >
>> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>> wrote:
>> > >> >
>> > >> > > Hi folks,
>> > >> > >
>> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
>> 2.9
>> > >> line
>> > >> > and
>> > >> > > will be the latest stable/production release for Apache Hadoop -
>> it
>> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
>> 787
>> > Bug
>> > >> > > fixes new fixed issues since 2.8.2 .
>> > >> > >
>> > >> > >       More information about the 2.9.0 release plan can be found
>> > here:
>> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9
>> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9>*
>> > >> > >
>> > >> > >       New RC is available at:
>> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >> > >
>> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
>> commit
>> > >> id
>> > >> > is:
>> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >> > >
>> > >> > >       The maven artifacts are available via
>> repository.apache.org
>> > at:
>> > >> > > *
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > <
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > >*
>> > >> > >
>> > >> > >       Please try the release and vote; the vote will run for the
>> > >> usual 5
>> > >> > > days, ending on 11/10/2017 4pm PST time.
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Arun/Subru
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Konstantinos Karanasos <kk...@gmail.com>.
+1 from me too.

Did the following:
1) set up a 9-node cluster;
2) ran some Gridmix jobs;
3) ran (2) after enabling opportunistic containers (used a mix of
guaranteed and opportunistic containers for each job);
4) ran (3) but this time enabling distributed scheduling of opportunistic
containers.

All the above worked with no issues.

Thanks for all the effort guys!

Konstantinos



Konstantinos

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com.invalid>
wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
> > Sunil / Rohith,
> >
> > Could you check if your configs are same as Jonathan posted configs?
> > https://issues.apache.org/jira/browse/YARN-7453?
> focusedCommentId=16242693&
> > page=com.atlassian.jira.plugin.system.issuetabpanels:
> > comment-tabpanel#comment-16242693
> >
> > And could you try if using Jonathan's configs can still reproduce the
> > issue?
> >
> > Thanks,
> > Wangda
> >
> >
> > On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
> >
> > > Thanks for testing Rohith and Sunil
> > >
> > > Can you please confirm if it is not a config issue at your end ?
> > > We (both Jonathan and myself) just tried testing this on a fresh
> cluster
> > > (both automatic and manual) and we are not able to reproduce this. I've
> > > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453
> >
> > > JIRA
> > > with details of testing.
> > >
> > > Cheers
> > > -Arun/Subru
> > >
> > > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > > rohithsharmaks@apache.org
> > > > wrote:
> > >
> > > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > > issue.
> > > >
> > > > - Rohith Sharma K S
> > > >
> > > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > > >
> > > >> Hi Subru and Arun.
> > > >>
> > > >> Thanks for driving 2.9 release. Great work!
> > > >>
> > > >> I installed cluster built from source.
> > > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > > >> - Accessed new UI and it also seems fine.
> > > >>
> > > >> However I am also getting same issue as Rohith reported.
> > > >> - Started an HA cluster
> > > >> - Pushed RM to standby
> > > >> - Pushed back RM to active then seeing an exception.
> > > >>
> > > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > > >> Active
> > > >>         at
> > > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorServic
> > > >>     e.becomeActive(ActiveStandbyElectorBasedElect
> orService.java:146)
> > > >>         at
> > > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894
> > > >>     )
> > > >>
> > > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> KeeperErrorCode = NoAuth
> > > >>         at
> > > >> org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:
> > > >> 949)
> > > >>
> > > >> Will check and post more details,
> > > >>
> > > >> - Sunil
> > > >>
> > > >>
> > > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > > >> rohithsharmaks@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Thanks Subru/Arun for the great work!
> > > >> >
> > > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > > cluster
> > > >> > along with new YARN UI and ATSv2.
> > > >> >
> > > >> > I am facing basic RM HA switch issue after first time successful
> > > start.
> > > >> > *Can
> > > >> > anyone else is facing this issue?*
> > > >> >
> > > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> > switch
> > > to
> > > >> > active successfully. Exception trace I see from the log is
> > > >> >
> > > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > > ActiveStandbyElector:
> > > >> > Exception handling the winning of election
> > > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> > transition
> > > to
> > > >> > Active
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:146)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > > >> eStandbyElector.java:894)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > > >> veStandbyElector.java:473)
> > > >> >     at
> > > >> >
> > > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > > >> ClientCnxn.java:599)
> > > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.
> > > >> java:498)
> > > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error
> when
> > > >> > transitioning to Active mode
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:325)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > > >> torBasedElectorService.java:144)
> > > >> >     ... 4 more
> > > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > > KeeperErrorCode =
> > > >> > NoAuth
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > > >> iceStateException.java:105)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:205)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.startActiveServices(ResourceManager.java:1131)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1171)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$1.run(ResourceManager.java:1167)
> > > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > > >> upInformation.java:1886)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r.transitionToActive(ResourceManager.java:1167)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > > >> ransitionToActive(AdminService.java:320)
> > > >> >     ... 5 more
> > > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > > >> > KeeperErrorCode = NoAuth
> > > >> >     at
> > > >> > org.apache.zookeeper.KeeperException.create(
> > KeeperException.java:113)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > > ZooKeeper.java:949)
> > > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > > >> peration(CuratorTransactionImpl.java:159)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > > >> ess$200(CuratorTransactionImpl.java:44)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:129)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > > >> all(CuratorTransactionImpl.java:125)
> > > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> > 107)
> > > >> >     at
> > > >> >
> > > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > > >> mit(CuratorTransactionImpl.java:122)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > > >> ion.commit(ZKCuratorManager.java:403)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > > >> ZKCuratorManager.java:372)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > > >> >     at
> > > >> >
> > > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > > >> >     at
> > > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > > >> ice.java:194)
> > > >> >     ... 13 more
> > > >> >
> > > >> > Thanks & Regards
> > > >> > Rohith Sharma K S
> > > >> >
> > > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> > wrote:
> > > >> >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
> 2.9
> > > >> line
> > > >> > and
> > > >> > > will be the latest stable/production release for Apache Hadoop -
> > it
> > > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
> 787
> > > Bug
> > > >> > > fixes new fixed issues since 2.8.2 .
> > > >> > >
> > > >> > >       More information about the 2.9.0 release plan can be found
> > > here:
> > > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9
> > > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > >> > > Roadmap#Roadmap-Version2.9>*
> > > >> > >
> > > >> > >       New RC is available at:
> > > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > > >> > >
> > > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> > commit
> > > >> id
> > > >> > is:
> > > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > > >> > >
> > > >> > >       The maven artifacts are available via
> repository.apache.org
> > > at:
> > > >> > > *
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > <
> > > >> > https://repository.apache.org/content/repositories/orgapache
> > > >> hadoop-1065/
> > > >> > > >*
> > > >> > >
> > > >> > >       Please try the release and vote; the vote will run for the
> > > >> usual 5
> > > >> > > days, ending on 11/10/2017 4pm PST time.
> > > >> > >
> > > >> > > Thanks,
> > > >> > >
> > > >> > > Arun/Subru
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Jonathan Hung <jy...@gmail.com>.
Thanks Arun and Subru for working on this!

+1 (non-binding) pending YARN-7453.

1) Setup RM HA
2) Verified leveldb/zookeeper scheduler configuration API works via REST/CLI
3) Verified configuration changes persist across restart
4) yarn rmadmin -refreshQueues works when scheduler configuration API
disabled (and vice-versa)


Jonathan Hung

On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <eb...@oath.com> wrote:

> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> Sunil / Rohith,
>>
>> Could you check if your configs are same as Jonathan posted configs?
>> https://issues.apache.org/jira/browse/YARN-7453?focusedComme
>> ntId=16242693&page=com.atlassian.jira.plugin.system.
>> issuetabpanels:comment-tabpanel#comment-16242693
>>
>> And could you try if using Jonathan's configs can still reproduce the
>> issue?
>>
>> Thanks,
>> Wangda
>>
>>
>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> > Thanks for testing Rohith and Sunil
>> >
>> > Can you please confirm if it is not a config issue at your end ?
>> > We (both Jonathan and myself) just tried testing this on a fresh cluster
>> > (both automatic and manual) and we are not able to reproduce this. I've
>> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
>> > JIRA
>> > with details of testing.
>> >
>> > Cheers
>> > -Arun/Subru
>> >
>> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>> > rohithsharmaks@apache.org
>> > > wrote:
>> >
>> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
>> > > issue.
>> > >
>> > > - Rohith Sharma K S
>> > >
>> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>> > >
>> > >> Hi Subru and Arun.
>> > >>
>> > >> Thanks for driving 2.9 release. Great work!
>> > >>
>> > >> I installed cluster built from source.
>> > >> - Ran few MR jobs with application priority enabled. Runs fine.
>> > >> - Accessed new UI and it also seems fine.
>> > >>
>> > >> However I am also getting same issue as Rohith reported.
>> > >> - Started an HA cluster
>> > >> - Pushed RM to standby
>> > >> - Pushed back RM to active then seeing an exception.
>> > >>
>> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition to
>> > >> Active
>> > >>         at
>> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorServic
>> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>> > >>         at
>> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894
>> > >>     )
>> > >>
>> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> KeeperErrorCode = NoAuth
>> > >>         at
>> > >> org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >>         at org.apache.zookeeper.ZooKeeper
>> .multiInternal(ZooKeeper.java:
>> > >> 949)
>> > >>
>> > >> Will check and post more details,
>> > >>
>> > >> - Sunil
>> > >>
>> > >>
>> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> > >> rohithsharmaks@apache.org>
>> > >> wrote:
>> > >>
>> > >> > Thanks Subru/Arun for the great work!
>> > >> >
>> > >> > Downloaded source and built from it. Deployed RM HA non-secured
>> > cluster
>> > >> > along with new YARN UI and ATSv2.
>> > >> >
>> > >> > I am facing basic RM HA switch issue after first time successful
>> > start.
>> > >> > *Can
>> > >> > anyone else is facing this issue?*
>> > >> >
>> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
>> switch
>> > to
>> > >> > active successfully. Exception trace I see from the log is
>> > >> >
>> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>> > ActiveStandbyElector:
>> > >> > Exception handling the winning of election
>> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
>> transition
>> > to
>> > >> > Active
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:146)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> > >> eStandbyElector.java:894)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> > >> veStandbyElector.java:473)
>> > >> >     at
>> > >> >
>> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> > >> ClientCnxn.java:599)
>> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> > >> java:498)
>> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > >> > transitioning to Active mode
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:325)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> > >> torBasedElectorService.java:144)
>> > >> >     ... 4 more
>> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode =
>> > >> > NoAuth
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> > >> iceStateException.java:105)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:205)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.startActiveServices(ResourceManager.java:1131)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1171)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$1.run(ResourceManager.java:1167)
>> > >> >     at java.security.AccessController.doPrivileged(Native Method)
>> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> > >> upInformation.java:1886)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r.transitionToActive(ResourceManager.java:1167)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> > >> ransitionToActive(AdminService.java:320)
>> > >> >     ... 5 more
>> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > >> > KeeperErrorCode = NoAuth
>> > >> >     at
>> > >> > org.apache.zookeeper.KeeperException.create(KeeperException.
>> java:113)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
>> > ZooKeeper.java:949)
>> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> > >> peration(CuratorTransactionImpl.java:159)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> > >> ess$200(CuratorTransactionImpl.java:44)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:129)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> > >> all(CuratorTransactionImpl.java:125)
>> > >> >     at org.apache.curator.RetryLoop.c
>> allWithRetry(RetryLoop.java:107)
>> > >> >     at
>> > >> >
>> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> > >> mit(CuratorTransactionImpl.java:122)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> > >> ion.commit(ZKCuratorManager.java:403)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> > >> ZKCuratorManager.java:372)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> > >> >     at
>> > >> >
>> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> > >> >     at
>> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> > >> ice.java:194)
>> > >> >     ... 13 more
>> > >> >
>> > >> > Thanks & Regards
>> > >> > Rohith Sharma K S
>> > >> >
>> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
>> wrote:
>> > >> >
>> > >> > > Hi folks,
>> > >> > >
>> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop
>> 2.9
>> > >> line
>> > >> > and
>> > >> > > will be the latest stable/production release for Apache Hadoop -
>> it
>> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements,
>> 787
>> > Bug
>> > >> > > fixes new fixed issues since 2.8.2 .
>> > >> > >
>> > >> > >       More information about the 2.9.0 release plan can be found
>> > here:
>> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9
>> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > >> > > Roadmap#Roadmap-Version2.9>*
>> > >> > >
>> > >> > >       New RC is available at:
>> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >> > >
>> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
>> commit
>> > >> id
>> > >> > is:
>> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >> > >
>> > >> > >       The maven artifacts are available via
>> repository.apache.org
>> > at:
>> > >> > > *
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > <
>> > >> > https://repository.apache.org/content/repositories/orgapache
>> > >> hadoop-1065/
>> > >> > > >*
>> > >> > >
>> > >> > >       Please try the release and vote; the vote will run for the
>> > >> usual 5
>> > >> > > days, ending on 11/10/2017 4pm PST time.
>> > >> > >
>> > >> > > Thanks,
>> > >> > >
>> > >> > > Arun/Subru
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Eric Badger <eb...@oath.com.INVALID>.
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharmaks@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >>         at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >>         at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >>     )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharmaks@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> >     at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> >     ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > >> iceStateException.java:105)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:205)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.startActiveServices(ResourceManager.java:1131)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1171)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1167)
> > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > >> upInformation.java:1886)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.transitionToActive(ResourceManager.java:1167)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:320)
> > >> >     ... 5 more
> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> > KeeperErrorCode = NoAuth
> > >> >     at
> > >> > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:949)
> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > >> peration(CuratorTransactionImpl.java:159)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > >> ess$200(CuratorTransactionImpl.java:44)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:129)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:125)
> > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> 107)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > >> mit(CuratorTransactionImpl.java:122)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > >> ion.commit(ZKCuratorManager.java:403)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > >> ZKCuratorManager.java:372)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:194)
> > >> >     ... 13 more
> > >> >
> > >> > Thanks & Regards
> > >> > Rohith Sharma K S
> > >> >
> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> > >> line
> > >> > and
> > >> > > will be the latest stable/production release for Apache Hadoop -
> it
> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> > Bug
> > >> > > fixes new fixed issues since 2.8.2 .
> > >> > >
> > >> > >       More information about the 2.9.0 release plan can be found
> > here:
> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9
> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9>*
> > >> > >
> > >> > >       New RC is available at:
> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >> > >
> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> commit
> > >> id
> > >> > is:
> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >> > >
> > >> > >       The maven artifacts are available via repository.apache.org
> > at:
> > >> > > *
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > <
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > >*
> > >> > >
> > >> > >       Please try the release and vote; the vote will run for the
> > >> usual 5
> > >> > > days, ending on 11/10/2017 4pm PST time.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Arun/Subru
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Eric Badger <eb...@oath.com.INVALID>.
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharmaks@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >>         at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >>         at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >>     )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharmaks@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> >     at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> >     ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > >> iceStateException.java:105)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:205)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.startActiveServices(ResourceManager.java:1131)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1171)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1167)
> > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > >> upInformation.java:1886)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.transitionToActive(ResourceManager.java:1167)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:320)
> > >> >     ... 5 more
> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> > KeeperErrorCode = NoAuth
> > >> >     at
> > >> > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:949)
> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > >> peration(CuratorTransactionImpl.java:159)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > >> ess$200(CuratorTransactionImpl.java:44)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:129)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:125)
> > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> 107)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > >> mit(CuratorTransactionImpl.java:122)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > >> ion.commit(ZKCuratorManager.java:403)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > >> ZKCuratorManager.java:372)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:194)
> > >> >     ... 13 more
> > >> >
> > >> > Thanks & Regards
> > >> > Rohith Sharma K S
> > >> >
> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> > >> line
> > >> > and
> > >> > > will be the latest stable/production release for Apache Hadoop -
> it
> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> > Bug
> > >> > > fixes new fixed issues since 2.8.2 .
> > >> > >
> > >> > >       More information about the 2.9.0 release plan can be found
> > here:
> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9
> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9>*
> > >> > >
> > >> > >       New RC is available at:
> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >> > >
> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> commit
> > >> id
> > >> > is:
> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >> > >
> > >> > >       The maven artifacts are available via repository.apache.org
> > at:
> > >> > > *
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > <
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > >*
> > >> > >
> > >> > >       Please try the release and vote; the vote will run for the
> > >> usual 5
> > >> > > days, ending on 11/10/2017 4pm PST time.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Arun/Subru
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Eric Badger <eb...@oath.com.INVALID>.
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharmaks@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >>         at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >>         at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >>     )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharmaks@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> >     at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> >     ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > >> iceStateException.java:105)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:205)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.startActiveServices(ResourceManager.java:1131)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1171)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1167)
> > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > >> upInformation.java:1886)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.transitionToActive(ResourceManager.java:1167)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:320)
> > >> >     ... 5 more
> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> > KeeperErrorCode = NoAuth
> > >> >     at
> > >> > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:949)
> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > >> peration(CuratorTransactionImpl.java:159)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > >> ess$200(CuratorTransactionImpl.java:44)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:129)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:125)
> > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> 107)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > >> mit(CuratorTransactionImpl.java:122)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > >> ion.commit(ZKCuratorManager.java:403)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > >> ZKCuratorManager.java:372)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:194)
> > >> >     ... 13 more
> > >> >
> > >> > Thanks & Regards
> > >> > Rohith Sharma K S
> > >> >
> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> > >> line
> > >> > and
> > >> > > will be the latest stable/production release for Apache Hadoop -
> it
> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> > Bug
> > >> > > fixes new fixed issues since 2.8.2 .
> > >> > >
> > >> > >       More information about the 2.9.0 release plan can be found
> > here:
> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9
> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9>*
> > >> > >
> > >> > >       New RC is available at:
> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >> > >
> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> commit
> > >> id
> > >> > is:
> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >> > >
> > >> > >       The maven artifacts are available via repository.apache.org
> > at:
> > >> > > *
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > <
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > >*
> > >> > >
> > >> > >       Please try the release and vote; the vote will run for the
> > >> usual 5
> > >> > > days, ending on 11/10/2017 4pm PST time.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Arun/Subru
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Eric Badger <eb...@oath.com.INVALID>.
+1 (non-binding) pending the issue that Sunil/Rohith pointed out

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wh...@gmail.com> wrote:

> Sunil / Rohith,
>
> Could you check if your configs are same as Jonathan posted configs?
> https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-16242693
>
> And could you try if using Jonathan's configs can still reproduce the
> issue?
>
> Thanks,
> Wangda
>
>
> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Thanks for testing Rohith and Sunil
> >
> > Can you please confirm if it is not a config issue at your end ?
> > We (both Jonathan and myself) just tried testing this on a fresh cluster
> > (both automatic and manual) and we are not able to reproduce this. I've
> > updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> > JIRA
> > with details of testing.
> >
> > Cheers
> > -Arun/Subru
> >
> > On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> > rohithsharmaks@apache.org
> > > wrote:
> >
> > > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > > issue.
> > >
> > > - Rohith Sharma K S
> > >
> > > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> > >
> > >> Hi Subru and Arun.
> > >>
> > >> Thanks for driving 2.9 release. Great work!
> > >>
> > >> I installed cluster built from source.
> > >> - Ran few MR jobs with application priority enabled. Runs fine.
> > >> - Accessed new UI and it also seems fine.
> > >>
> > >> However I am also getting same issue as Rohith reported.
> > >> - Started an HA cluster
> > >> - Pushed RM to standby
> > >> - Pushed back RM to active then seeing an exception.
> > >>
> > >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> > >> Active
> > >>         at
> > >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorServic
> > >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> > >>         at
> > >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894
> > >>     )
> > >>
> > >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> KeeperErrorCode = NoAuth
> > >>         at
> > >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> > >>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:
> > >> 949)
> > >>
> > >> Will check and post more details,
> > >>
> > >> - Sunil
> > >>
> > >>
> > >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> > >> rohithsharmaks@apache.org>
> > >> wrote:
> > >>
> > >> > Thanks Subru/Arun for the great work!
> > >> >
> > >> > Downloaded source and built from it. Deployed RM HA non-secured
> > cluster
> > >> > along with new YARN UI and ATSv2.
> > >> >
> > >> > I am facing basic RM HA switch issue after first time successful
> > start.
> > >> > *Can
> > >> > anyone else is facing this issue?*
> > >> >
> > >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never
> switch
> > to
> > >> > active successfully. Exception trace I see from the log is
> > >> >
> > >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> > ActiveStandbyElector:
> > >> > Exception handling the winning of election
> > >> > org.apache.hadoop.ha.ServiceFailedException: RM could not
> transition
> > to
> > >> > Active
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:146)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> > >> eStandbyElector.java:894)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> > >> veStandbyElector.java:473)
> > >> >     at
> > >> >
> > >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> > >> ClientCnxn.java:599)
> > >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> > >> java:498)
> > >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > >> > transitioning to Active mode
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:325)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> > >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> > >> torBasedElectorService.java:144)
> > >> >     ... 4 more
> > >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > >> > org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode =
> > >> > NoAuth
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> > >> iceStateException.java:105)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:205)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.startActiveServices(ResourceManager.java:1131)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1171)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$1.run(ResourceManager.java:1167)
> > >> >     at java.security.AccessController.doPrivileged(Native Method)
> > >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> > >> upInformation.java:1886)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r.transitionToActive(ResourceManager.java:1167)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> > >> ransitionToActive(AdminService.java:320)
> > >> >     ... 5 more
> > >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > >> > KeeperErrorCode = NoAuth
> > >> >     at
> > >> > org.apache.zookeeper.KeeperException.create(
> KeeperException.java:113)
> > >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> > ZooKeeper.java:949)
> > >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> > >> peration(CuratorTransactionImpl.java:159)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> > >> ess$200(CuratorTransactionImpl.java:44)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:129)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> > >> all(CuratorTransactionImpl.java:125)
> > >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:
> 107)
> > >> >     at
> > >> >
> > >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> > >> mit(CuratorTransactionImpl.java:122)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> > >> ion.commit(ZKCuratorManager.java:403)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> > >> ZKCuratorManager.java:372)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> > >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> > >> >     at
> > >> >
> > >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> > >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> > >> >     at
> > >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> > >> ice.java:194)
> > >> >     ... 13 more
> > >> >
> > >> > Thanks & Regards
> > >> > Rohith Sharma K S
> > >> >
> > >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org>
> wrote:
> > >> >
> > >> > > Hi folks,
> > >> > >
> > >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> > >> line
> > >> > and
> > >> > > will be the latest stable/production release for Apache Hadoop -
> it
> > >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> > Bug
> > >> > > fixes new fixed issues since 2.8.2 .
> > >> > >
> > >> > >       More information about the 2.9.0 release plan can be found
> > here:
> > >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9
> > >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > >> > > Roadmap#Roadmap-Version2.9>*
> > >> > >
> > >> > >       New RC is available at:
> > >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >> > >
> > >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest
> commit
> > >> id
> > >> > is:
> > >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >> > >
> > >> > >       The maven artifacts are available via repository.apache.org
> > at:
> > >> > > *
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > <
> > >> > https://repository.apache.org/content/repositories/orgapache
> > >> hadoop-1065/
> > >> > > >*
> > >> > >
> > >> > >       Please try the release and vote; the vote will run for the
> > >> usual 5
> > >> > > days, ending on 11/10/2017 4pm PST time.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Arun/Subru
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Wangda Tan <wh...@gmail.com>.
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharmaks@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >>         at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >>         at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >>     )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >>         at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharmaks@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> >     at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> >     ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> >     at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1167)
> >> >     at java.security.AccessController.doPrivileged(Native Method)
> >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >     at
> >> >
> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >> upInformation.java:1886)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.transitionToActive(ResourceManager.java:1167)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:320)
> >> >     ... 5 more
> >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> > KeeperErrorCode = NoAuth
> >> >     at
> >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
> >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >> peration(CuratorTransactionImpl.java:159)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >> ess$200(CuratorTransactionImpl.java:44)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:129)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:125)
> >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >> mit(CuratorTransactionImpl.java:122)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >> ion.commit(ZKCuratorManager.java:403)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >> ZKCuratorManager.java:372)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:194)
> >> >     ... 13 more
> >> >
> >> > Thanks & Regards
> >> > Rohith Sharma K S
> >> >
> >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> >> line
> >> > and
> >> > > will be the latest stable/production release for Apache Hadoop - it
> >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> Bug
> >> > > fixes new fixed issues since 2.8.2 .
> >> > >
> >> > >       More information about the 2.9.0 release plan can be found
> here:
> >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9
> >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9>*
> >> > >
> >> > >       New RC is available at:
> >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >> > >
> >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
> >> id
> >> > is:
> >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >> > >
> >> > >       The maven artifacts are available via repository.apache.org
> at:
> >> > > *
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > <
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > >*
> >> > >
> >> > >       Please try the release and vote; the vote will run for the
> >> usual 5
> >> > > days, ending on 11/10/2017 4pm PST time.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Arun/Subru
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Wangda Tan <wh...@gmail.com>.
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharmaks@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >>         at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >>         at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >>     )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >>         at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharmaks@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> >     at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> >     ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> >     at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1167)
> >> >     at java.security.AccessController.doPrivileged(Native Method)
> >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >     at
> >> >
> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >> upInformation.java:1886)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.transitionToActive(ResourceManager.java:1167)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:320)
> >> >     ... 5 more
> >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> > KeeperErrorCode = NoAuth
> >> >     at
> >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
> >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >> peration(CuratorTransactionImpl.java:159)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >> ess$200(CuratorTransactionImpl.java:44)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:129)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:125)
> >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >> mit(CuratorTransactionImpl.java:122)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >> ion.commit(ZKCuratorManager.java:403)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >> ZKCuratorManager.java:372)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:194)
> >> >     ... 13 more
> >> >
> >> > Thanks & Regards
> >> > Rohith Sharma K S
> >> >
> >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> >> line
> >> > and
> >> > > will be the latest stable/production release for Apache Hadoop - it
> >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> Bug
> >> > > fixes new fixed issues since 2.8.2 .
> >> > >
> >> > >       More information about the 2.9.0 release plan can be found
> here:
> >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9
> >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9>*
> >> > >
> >> > >       New RC is available at:
> >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >> > >
> >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
> >> id
> >> > is:
> >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >> > >
> >> > >       The maven artifacts are available via repository.apache.org
> at:
> >> > > *
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > <
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > >*
> >> > >
> >> > >       Please try the release and vote; the vote will run for the
> >> usual 5
> >> > > days, ending on 11/10/2017 4pm PST time.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Arun/Subru
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Wangda Tan <wh...@gmail.com>.
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharmaks@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >>         at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >>         at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >>     )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >>         at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharmaks@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> >     at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> >     ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> >     at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1167)
> >> >     at java.security.AccessController.doPrivileged(Native Method)
> >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >     at
> >> >
> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >> upInformation.java:1886)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.transitionToActive(ResourceManager.java:1167)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:320)
> >> >     ... 5 more
> >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> > KeeperErrorCode = NoAuth
> >> >     at
> >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
> >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >> peration(CuratorTransactionImpl.java:159)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >> ess$200(CuratorTransactionImpl.java:44)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:129)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:125)
> >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >> mit(CuratorTransactionImpl.java:122)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >> ion.commit(ZKCuratorManager.java:403)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >> ZKCuratorManager.java:372)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:194)
> >> >     ... 13 more
> >> >
> >> > Thanks & Regards
> >> > Rohith Sharma K S
> >> >
> >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> >> line
> >> > and
> >> > > will be the latest stable/production release for Apache Hadoop - it
> >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> Bug
> >> > > fixes new fixed issues since 2.8.2 .
> >> > >
> >> > >       More information about the 2.9.0 release plan can be found
> here:
> >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9
> >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9>*
> >> > >
> >> > >       New RC is available at:
> >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >> > >
> >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
> >> id
> >> > is:
> >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >> > >
> >> > >       The maven artifacts are available via repository.apache.org
> at:
> >> > > *
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > <
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > >*
> >> > >
> >> > >       Please try the release and vote; the vote will run for the
> >> usual 5
> >> > > days, ending on 11/10/2017 4pm PST time.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Arun/Subru
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Wangda Tan <wh...@gmail.com>.
Sunil / Rohith,

Could you check if your configs are same as Jonathan posted configs?
https://issues.apache.org/jira/browse/YARN-7453?focusedCommentId=16242693&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16242693

And could you try if using Jonathan's configs can still reproduce the
issue?

Thanks,
Wangda


On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <as...@apache.org> wrote:

> Thanks for testing Rohith and Sunil
>
> Can you please confirm if it is not a config issue at your end ?
> We (both Jonathan and myself) just tried testing this on a fresh cluster
> (both automatic and manual) and we are not able to reproduce this. I've
> updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453>
> JIRA
> with details of testing.
>
> Cheers
> -Arun/Subru
>
> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
> rohithsharmaks@apache.org
> > wrote:
>
> > Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> > <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> > issue.
> >
> > - Rohith Sharma K S
> >
> > On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
> >
> >> Hi Subru and Arun.
> >>
> >> Thanks for driving 2.9 release. Great work!
> >>
> >> I installed cluster built from source.
> >> - Ran few MR jobs with application priority enabled. Runs fine.
> >> - Accessed new UI and it also seems fine.
> >>
> >> However I am also getting same issue as Rohith reported.
> >> - Started an HA cluster
> >> - Pushed RM to standby
> >> - Pushed back RM to active then seeing an exception.
> >>
> >> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> >> Active
> >>         at
> >> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorServic
> >>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
> >>         at
> >> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894
> >>     )
> >>
> >> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> KeeperErrorCode = NoAuth
> >>         at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
> >> 949)
> >>
> >> Will check and post more details,
> >>
> >> - Sunil
> >>
> >>
> >> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> >> rohithsharmaks@apache.org>
> >> wrote:
> >>
> >> > Thanks Subru/Arun for the great work!
> >> >
> >> > Downloaded source and built from it. Deployed RM HA non-secured
> cluster
> >> > along with new YARN UI and ATSv2.
> >> >
> >> > I am facing basic RM HA switch issue after first time successful
> start.
> >> > *Can
> >> > anyone else is facing this issue?*
> >> >
> >> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch
> to
> >> > active successfully. Exception trace I see from the log is
> >> >
> >> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
> ActiveStandbyElector:
> >> > Exception handling the winning of election
> >> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition
> to
> >> > Active
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:146)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
> >> eStandbyElector.java:894)
> >> >     at
> >> >
> >> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
> >> veStandbyElector.java:473)
> >> >     at
> >> >
> >> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> >> ClientCnxn.java:599)
> >> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
> >> java:498)
> >> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> >> > transitioning to Active mode
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:325)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
> >> lectorBasedElectorService.becomeActive(ActiveStandbyElec
> >> torBasedElectorService.java:144)
> >> >     ... 4 more
> >> > Caused by: org.apache.hadoop.service.ServiceStateException:
> >> > org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode =
> >> > NoAuth
> >> >     at
> >> >
> >> > org.apache.hadoop.service.ServiceStateException.convert(Serv
> >> iceStateException.java:105)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:205)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.startActiveServices(ResourceManager.java:1131)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1171)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$1.run(ResourceManager.java:1167)
> >> >     at java.security.AccessController.doPrivileged(Native Method)
> >> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >> >     at
> >> >
> >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
> >> upInformation.java:1886)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r.transitionToActive(ResourceManager.java:1167)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
> >> ransitionToActive(AdminService.java:320)
> >> >     ... 5 more
> >> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> >> > KeeperErrorCode = NoAuth
> >> >     at
> >> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >> >     at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
> >> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
> >> peration(CuratorTransactionImpl.java:159)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
> >> ess$200(CuratorTransactionImpl.java:44)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:129)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
> >> all(CuratorTransactionImpl.java:125)
> >> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >> >     at
> >> >
> >> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
> >> mit(CuratorTransactionImpl.java:122)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
> >> ion.commit(ZKCuratorManager.java:403)
> >> >     at
> >> >
> >> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
> >> ZKCuratorManager.java:372)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
> >> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
> >> >     at
> >> >
> >> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
> >> r$RMActiveServices.serviceStart(ResourceManager.java:754)
> >> >     at
> >> > org.apache.hadoop.service.AbstractService.start(AbstractServ
> >> ice.java:194)
> >> >     ... 13 more
> >> >
> >> > Thanks & Regards
> >> > Rohith Sharma K S
> >> >
> >> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >> >
> >> > > Hi folks,
> >> > >
> >> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
> >> line
> >> > and
> >> > > will be the latest stable/production release for Apache Hadoop - it
> >> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787
> Bug
> >> > > fixes new fixed issues since 2.8.2 .
> >> > >
> >> > >       More information about the 2.9.0 release plan can be found
> here:
> >> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9
> >> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> >> > > Roadmap#Roadmap-Version2.9>*
> >> > >
> >> > >       New RC is available at:
> >> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >> > >
> >> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
> >> id
> >> > is:
> >> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >> > >
> >> > >       The maven artifacts are available via repository.apache.org
> at:
> >> > > *
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > <
> >> > https://repository.apache.org/content/repositories/orgapache
> >> hadoop-1065/
> >> > > >*
> >> > >
> >> > >       Please try the release and vote; the vote will run for the
> >> usual 5
> >> > > days, ending on 11/10/2017 4pm PST time.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Arun/Subru
> >> > >
> >> >
> >>
> >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <rohithsharmaks@apache.org
> wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>>         at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>>         at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>>     )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharmaks@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> >     at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> >     ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> >     at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> >     at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> >     ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> >     at
>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> peration(CuratorTransactionImpl.java:159)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> ess$200(CuratorTransactionImpl.java:44)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:129)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:125)
>> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> mit(CuratorTransactionImpl.java:122)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> ion.commit(ZKCuratorManager.java:403)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> ZKCuratorManager.java:372)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:194)
>> >     ... 13 more
>> >
>> > Thanks & Regards
>> > Rohith Sharma K S
>> >
>> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>> >
>> > > Hi folks,
>> > >
>> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
>> line
>> > and
>> > > will be the latest stable/production release for Apache Hadoop - it
>> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> > > fixes new fixed issues since 2.8.2 .
>> > >
>> > >       More information about the 2.9.0 release plan can be found here:
>> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9
>> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9>*
>> > >
>> > >       New RC is available at:
>> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >
>> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
>> id
>> > is:
>> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >
>> > >       The maven artifacts are available via repository.apache.org at:
>> > > *
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > <
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > >*
>> > >
>> > >       Please try the release and vote; the vote will run for the
>> usual 5
>> > > days, ending on 11/10/2017 4pm PST time.
>> > >
>> > > Thanks,
>> > >
>> > > Arun/Subru
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <rohithsharmaks@apache.org
> wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>>         at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>>         at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>>     )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharmaks@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> >     at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> >     ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> >     at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> >     at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> >     ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> >     at
>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> peration(CuratorTransactionImpl.java:159)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> ess$200(CuratorTransactionImpl.java:44)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:129)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:125)
>> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> mit(CuratorTransactionImpl.java:122)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> ion.commit(ZKCuratorManager.java:403)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> ZKCuratorManager.java:372)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:194)
>> >     ... 13 more
>> >
>> > Thanks & Regards
>> > Rohith Sharma K S
>> >
>> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>> >
>> > > Hi folks,
>> > >
>> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
>> line
>> > and
>> > > will be the latest stable/production release for Apache Hadoop - it
>> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> > > fixes new fixed issues since 2.8.2 .
>> > >
>> > >       More information about the 2.9.0 release plan can be found here:
>> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9
>> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9>*
>> > >
>> > >       New RC is available at:
>> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >
>> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
>> id
>> > is:
>> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >
>> > >       The maven artifacts are available via repository.apache.org at:
>> > > *
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > <
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > >*
>> > >
>> > >       Please try the release and vote; the vote will run for the
>> usual 5
>> > > days, ending on 11/10/2017 4pm PST time.
>> > >
>> > > Thanks,
>> > >
>> > > Arun/Subru
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <rohithsharmaks@apache.org
> wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>>         at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>>         at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>>     )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharmaks@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> >     at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> >     ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> >     at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> >     at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> >     ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> >     at
>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> peration(CuratorTransactionImpl.java:159)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> ess$200(CuratorTransactionImpl.java:44)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:129)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:125)
>> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> mit(CuratorTransactionImpl.java:122)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> ion.commit(ZKCuratorManager.java:403)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> ZKCuratorManager.java:372)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:194)
>> >     ... 13 more
>> >
>> > Thanks & Regards
>> > Rohith Sharma K S
>> >
>> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>> >
>> > > Hi folks,
>> > >
>> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
>> line
>> > and
>> > > will be the latest stable/production release for Apache Hadoop - it
>> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> > > fixes new fixed issues since 2.8.2 .
>> > >
>> > >       More information about the 2.9.0 release plan can be found here:
>> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9
>> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9>*
>> > >
>> > >       New RC is available at:
>> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >
>> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
>> id
>> > is:
>> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >
>> > >       The maven artifacts are available via repository.apache.org at:
>> > > *
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > <
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > >*
>> > >
>> > >       Please try the release and vote; the vote will run for the
>> usual 5
>> > > days, ending on 11/10/2017 4pm PST time.
>> > >
>> > > Thanks,
>> > >
>> > > Arun/Subru
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Thanks for testing Rohith and Sunil

Can you please confirm if it is not a config issue at your end ?
We (both Jonathan and myself) just tried testing this on a fresh cluster
(both automatic and manual) and we are not able to reproduce this. I've
updated the YARN-7453 <https://issues.apache.org/jira/browse/YARN-7453> JIRA
with details of testing.

Cheers
-Arun/Subru

On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <rohithsharmaks@apache.org
> wrote:

> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
> <https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this
> issue.
>
> - Rohith Sharma K S
>
> On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:
>
>> Hi Subru and Arun.
>>
>> Thanks for driving 2.9 release. Great work!
>>
>> I installed cluster built from source.
>> - Ran few MR jobs with application priority enabled. Runs fine.
>> - Accessed new UI and it also seems fine.
>>
>> However I am also getting same issue as Rohith reported.
>> - Started an HA cluster
>> - Pushed RM to standby
>> - Pushed back RM to active then seeing an exception.
>>
>> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> Active
>>         at
>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorServic
>>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>>         at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894
>>     )
>>
>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> KeeperErrorCode = NoAuth
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>>         at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:
>> 949)
>>
>> Will check and post more details,
>>
>> - Sunil
>>
>>
>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>> rohithsharmaks@apache.org>
>> wrote:
>>
>> > Thanks Subru/Arun for the great work!
>> >
>> > Downloaded source and built from it. Deployed RM HA non-secured cluster
>> > along with new YARN UI and ATSv2.
>> >
>> > I am facing basic RM HA switch issue after first time successful start.
>> > *Can
>> > anyone else is facing this issue?*
>> >
>> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
>> > active successfully. Exception trace I see from the log is
>> >
>> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> > Exception handling the winning of election
>> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
>> > Active
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:146)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>> eStandbyElector.java:894)
>> >     at
>> >
>> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>> veStandbyElector.java:473)
>> >     at
>> >
>> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>> ClientCnxn.java:599)
>> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.
>> java:498)
>> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
>> > transitioning to Active mode
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:325)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>> torBasedElectorService.java:144)
>> >     ... 4 more
>> > Caused by: org.apache.hadoop.service.ServiceStateException:
>> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
>> > NoAuth
>> >     at
>> >
>> > org.apache.hadoop.service.ServiceStateException.convert(Serv
>> iceStateException.java:105)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:205)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.startActiveServices(ResourceManager.java:1131)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1171)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$1.run(ResourceManager.java:1167)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:422)
>> >     at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1886)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r.transitionToActive(ResourceManager.java:1167)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>> ransitionToActive(AdminService.java:320)
>> >     ... 5 more
>> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>> > KeeperErrorCode = NoAuth
>> >     at
>> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>> peration(CuratorTransactionImpl.java:159)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>> ess$200(CuratorTransactionImpl.java:44)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:129)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>> all(CuratorTransactionImpl.java:125)
>> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>> >     at
>> >
>> > org.apache.curator.framework.imps.CuratorTransactionImpl.com
>> mit(CuratorTransactionImpl.java:122)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>> ion.commit(ZKCuratorManager.java:403)
>> >     at
>> >
>> > org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>> ZKCuratorManager.java:372)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>> >     at
>> >
>> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>> >     at
>> > org.apache.hadoop.service.AbstractService.start(AbstractServ
>> ice.java:194)
>> >     ... 13 more
>> >
>> > Thanks & Regards
>> > Rohith Sharma K S
>> >
>> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>> >
>> > > Hi folks,
>> > >
>> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9
>> line
>> > and
>> > > will be the latest stable/production release for Apache Hadoop - it
>> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> > > fixes new fixed issues since 2.8.2 .
>> > >
>> > >       More information about the 2.9.0 release plan can be found here:
>> > > *https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9
>> > > <https://cwiki.apache.org/confluence/display/HADOOP/
>> > > Roadmap#Roadmap-Version2.9>*
>> > >
>> > >       New RC is available at:
>> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>> > >
>> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit
>> id
>> > is:
>> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>> > >
>> > >       The maven artifacts are available via repository.apache.org at:
>> > > *
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > <
>> > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1065/
>> > > >*
>> > >
>> > >       Please try the release and vote; the vote will run for the
>> usual 5
>> > > days, ending on 11/10/2017 4pm PST time.
>> > >
>> > > Thanks,
>> > >
>> > > Arun/Subru
>> > >
>> >
>>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
<https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>         at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>         at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
>     )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharmaks@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> >     at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> >     ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> >     at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> >     ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> >     at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:125)
> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.
> commit(CuratorTransactionImpl.java:122)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(
> ZKCuratorManager.java:403)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager.
> safeSetData(ZKCuratorManager.java:372)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.
> getAndIncrementEpoch(ZKRMStateStore.java:493)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$
> RMActiveServices.serviceStart(ResourceManager.java:754)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:194)
> >     ... 13 more
> >
> > Thanks & Regards
> > Rohith Sharma K S
> >
> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> > and
> > > will be the latest stable/production release for Apache Hadoop - it
> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > > fixes new fixed issues since 2.8.2 .
> > >
> > >       More information about the 2.9.0 release plan can be found here:
> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9
> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9>*
> > >
> > >       New RC is available at:
> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >
> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> > is:
> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >
> > >       The maven artifacts are available via repository.apache.org at:
> > > *
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > >*
> > >
> > >       Please try the release and vote; the vote will run for the usual
> 5
> > > days, ending on 11/10/2017 4pm PST time.
> > >
> > > Thanks,
> > >
> > > Arun/Subru
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
<https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>         at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>         at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
>     )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharmaks@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> >     at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> >     ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> >     at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> >     ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> >     at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:125)
> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.
> commit(CuratorTransactionImpl.java:122)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(
> ZKCuratorManager.java:403)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager.
> safeSetData(ZKCuratorManager.java:372)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.
> getAndIncrementEpoch(ZKRMStateStore.java:493)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$
> RMActiveServices.serviceStart(ResourceManager.java:754)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:194)
> >     ... 13 more
> >
> > Thanks & Regards
> > Rohith Sharma K S
> >
> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> > and
> > > will be the latest stable/production release for Apache Hadoop - it
> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > > fixes new fixed issues since 2.8.2 .
> > >
> > >       More information about the 2.9.0 release plan can be found here:
> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9
> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9>*
> > >
> > >       New RC is available at:
> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >
> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> > is:
> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >
> > >       The maven artifacts are available via repository.apache.org at:
> > > *
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > >*
> > >
> > >       Please try the release and vote; the vote will run for the usual
> 5
> > > days, ending on 11/10/2017 4pm PST time.
> > >
> > > Thanks,
> > >
> > > Arun/Subru
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
<https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>         at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>         at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
>     )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharmaks@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> >     at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> >     ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> >     at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> >     ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> >     at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:125)
> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.
> commit(CuratorTransactionImpl.java:122)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(
> ZKCuratorManager.java:403)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager.
> safeSetData(ZKCuratorManager.java:372)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.
> getAndIncrementEpoch(ZKRMStateStore.java:493)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$
> RMActiveServices.serviceStart(ResourceManager.java:754)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:194)
> >     ... 13 more
> >
> > Thanks & Regards
> > Rohith Sharma K S
> >
> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> > and
> > > will be the latest stable/production release for Apache Hadoop - it
> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > > fixes new fixed issues since 2.8.2 .
> > >
> > >       More information about the 2.9.0 release plan can be found here:
> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9
> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9>*
> > >
> > >       New RC is available at:
> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >
> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> > is:
> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >
> > >       The maven artifacts are available via repository.apache.org at:
> > > *
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > >*
> > >
> > >       Please try the release and vote; the vote will run for the usual
> 5
> > > days, ending on 11/10/2017 4pm PST time.
> > >
> > > Thanks,
> > >
> > > Arun/Subru
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Sunil for confirmation. Btw, I have raised YARN-7453
<https://issues.apache.org/jira/browse/YARN-7453> JIRA to track this issue.

- Rohith Sharma K S

On 7 November 2017 at 16:44, Sunil G <su...@apache.org> wrote:

> Hi Subru and Arun.
>
> Thanks for driving 2.9 release. Great work!
>
> I installed cluster built from source.
> - Ran few MR jobs with application priority enabled. Runs fine.
> - Accessed new UI and it also seems fine.
>
> However I am also getting same issue as Rohith reported.
> - Started an HA cluster
> - Pushed RM to standby
> - Pushed back RM to active then seeing an exception.
>
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>         at
> org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorServic
>     e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>         at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894
>     )
>
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>         at org.apache.zookeeper.ZooKeeper.multiInternal(
> ZooKeeper.java:949)
>
> Will check and post more details,
>
> - Sunil
>
>
> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
> rohithsharmaks@apache.org>
> wrote:
>
> > Thanks Subru/Arun for the great work!
> >
> > Downloaded source and built from it. Deployed RM HA non-secured cluster
> > along with new YARN UI and ATSv2.
> >
> > I am facing basic RM HA switch issue after first time successful start.
> > *Can
> > anyone else is facing this issue?*
> >
> > When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> > active successfully. Exception trace I see from the log is
> >
> > 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> > Exception handling the winning of election
> > org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> > Active
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:146)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(
> ActiveStandbyElector.java:894)
> >     at
> >
> > org.apache.hadoop.ha.ActiveStandbyElector.processResult(
> ActiveStandbyElector.java:473)
> >     at
> >
> > org.apache.zookeeper.ClientCnxn$EventThread.
> processEvent(ClientCnxn.java:599)
> >     at org.apache.zookeeper.ClientCnxn$EventThread.run(
> ClientCnxn.java:498)
> > Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> > transitioning to Active mode
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:325)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.
> ActiveStandbyElectorBasedElectorService.becomeActive(
> ActiveStandbyElectorBasedElectorService.java:144)
> >     ... 4 more
> > Caused by: org.apache.hadoop.service.ServiceStateException:
> > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> > NoAuth
> >     at
> >
> > org.apache.hadoop.service.ServiceStateException.convert(
> ServiceStateException.java:105)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:205)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> startActiveServices(ResourceManager.java:1131)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1171)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(
> ResourceManager.java:1167)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:422)
> >     at
> >
> > org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1886)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.
> transitionToActive(ResourceManager.java:1167)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.AdminService.
> transitionToActive(AdminService.java:320)
> >     ... 5 more
> > Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> > KeeperErrorCode = NoAuth
> >     at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
> >     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
> >     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(
> CuratorTransactionImpl.java:159)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(
> CuratorTransactionImpl.java:44)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:129)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl$2.
> call(CuratorTransactionImpl.java:125)
> >     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> >     at
> >
> > org.apache.curator.framework.imps.CuratorTransactionImpl.
> commit(CuratorTransactionImpl.java:122)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(
> ZKCuratorManager.java:403)
> >     at
> >
> > org.apache.hadoop.util.curator.ZKCuratorManager.
> safeSetData(ZKCuratorManager.java:372)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.
> getAndIncrementEpoch(ZKRMStateStore.java:493)
> >     at
> >
> > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$
> RMActiveServices.serviceStart(ResourceManager.java:754)
> >     at
> > org.apache.hadoop.service.AbstractService.start(
> AbstractService.java:194)
> >     ... 13 more
> >
> > Thanks & Regards
> > Rohith Sharma K S
> >
> > On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> > and
> > > will be the latest stable/production release for Apache Hadoop - it
> > > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > > fixes new fixed issues since 2.8.2 .
> > >
> > >       More information about the 2.9.0 release plan can be found here:
> > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9
> > > <https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9>*
> > >
> > >       New RC is available at:
> > > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> > >
> > >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> > is:
> > > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> > >
> > >       The maven artifacts are available via repository.apache.org at:
> > > *
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > <
> > https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > > >*
> > >
> > >       Please try the release and vote; the vote will run for the usual
> 5
> > > days, ending on 11/10/2017 4pm PST time.
> > >
> > > Thanks,
> > >
> > > Arun/Subru
> > >
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Sunil G <su...@apache.org>.
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
        at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
    e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
        at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
    )

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <ro...@apache.org>
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>     at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>     ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
>     at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>     ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>     ... 13 more
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>
> > Hi folks,
> >
> >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> > will be the latest stable/production release for Apache Hadoop - it
> > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > fixes new fixed issues since 2.8.2 .
> >
> >       More information about the 2.9.0 release plan can be found here:
> > *https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > <https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9>*
> >
> >       New RC is available at:
> > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >
> >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >
> >       The maven artifacts are available via repository.apache.org at:
> > *
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > <
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > >*
> >
> >       Please try the release and vote; the vote will run for the usual 5
> > days, ending on 11/10/2017 4pm PST time.
> >
> > Thanks,
> >
> > Arun/Subru
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Sunil G <su...@apache.org>.
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
        at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
    e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
        at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
    )

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <ro...@apache.org>
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>     at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>     ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
>     at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>     ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>     ... 13 more
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>
> > Hi folks,
> >
> >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> > will be the latest stable/production release for Apache Hadoop - it
> > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > fixes new fixed issues since 2.8.2 .
> >
> >       More information about the 2.9.0 release plan can be found here:
> > *https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > <https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9>*
> >
> >       New RC is available at:
> > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >
> >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >
> >       The maven artifacts are available via repository.apache.org at:
> > *
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > <
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > >*
> >
> >       Please try the release and vote; the vote will run for the usual 5
> > days, ending on 11/10/2017 4pm PST time.
> >
> > Thanks,
> >
> > Arun/Subru
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Sunil G <su...@apache.org>.
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
        at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
    e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
        at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
    )

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <ro...@apache.org>
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>     at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>     ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
>     at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>     ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>     ... 13 more
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>
> > Hi folks,
> >
> >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> > will be the latest stable/production release for Apache Hadoop - it
> > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > fixes new fixed issues since 2.8.2 .
> >
> >       More information about the 2.9.0 release plan can be found here:
> > *https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > <https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9>*
> >
> >       New RC is available at:
> > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >
> >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >
> >       The maven artifacts are available via repository.apache.org at:
> > *
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > <
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > >*
> >
> >       Please try the release and vote; the vote will run for the usual 5
> > days, ending on 11/10/2017 4pm PST time.
> >
> > Thanks,
> >
> > Arun/Subru
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Sunil G <su...@apache.org>.
Hi Subru and Arun.

Thanks for driving 2.9 release. Great work!

I installed cluster built from source.
- Ran few MR jobs with application priority enabled. Runs fine.
- Accessed new UI and it also seems fine.

However I am also getting same issue as Rohith reported.
- Started an HA cluster
- Pushed RM to standby
- Pushed back RM to active then seeing an exception.

org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
        at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorServic
    e.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
        at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894
    )

Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)

Will check and post more details,

- Sunil


On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <ro...@apache.org>
wrote:

> Thanks Subru/Arun for the great work!
>
> Downloaded source and built from it. Deployed RM HA non-secured cluster
> along with new YARN UI and ATSv2.
>
> I am facing basic RM HA switch issue after first time successful start.
> *Can
> anyone else is facing this issue?*
>
> When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
> active successfully. Exception trace I see from the log is
>
> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
> Active
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
>     at
>
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>     at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
> transitioning to Active mode
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>     ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException:
> org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
> NoAuth
>     at
>
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>     ... 5 more
> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
> KeeperErrorCode = NoAuth
>     at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>     at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
>     at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
>     at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>     at
>
> org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
>     at
>
> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>     at
>
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
>     at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>     ... 13 more
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:
>
> > Hi folks,
> >
> >      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> > will be the latest stable/production release for Apache Hadoop - it
> > includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> > fixes new fixed issues since 2.8.2 .
> >
> >       More information about the 2.9.0 release plan can be found here:
> > *https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9
> > <https://cwiki.apache.org/confluence/display/HADOOP/
> > Roadmap#Roadmap-Version2.9>*
> >
> >       New RC is available at:
> > http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >
> >       The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> > 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >
> >       The maven artifacts are available via repository.apache.org at:
> > *
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > <
> https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> > >*
> >
> >       Please try the release and vote; the vote will run for the usual 5
> > days, ending on 11/10/2017 4pm PST time.
> >
> > Thanks,
> >
> > Arun/Subru
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Subru/Arun for the great work!

Downloaded source and built from it. Deployed RM HA non-secured cluster
along with new YARN UI and ATSv2.

I am facing basic RM HA switch issue after first time successful start. *Can
anyone else is facing this issue?*

When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
active successfully. Exception trace I see from the log is

2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
transitioning to Active mode
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
    ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth
    at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
    ... 5 more
Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
    at
org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
    at
org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
    at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    ... 13 more

Thanks & Regards
Rohith Sharma K S

On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:

> Hi folks,
>
>      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>       More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>       New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>       The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>       The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>       Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Carlo Aldo Curino <ca...@gmail.com>.
+1 from me.

I have:
1) setup a small cluster
2) enabled the reservation system
3) submitted reservation through REST
4) ran a job within reservation
5) did few negative tests (what happens if res system is disabled and you
try to reserve, what happens if a job is submitted with invalid res id,
etc.)

Everything I tested worked as expected.

Cheers,
Carlo

On Nov 6, 2017 7:06 AM, "Arun Suresh" <as...@apache.org> wrote:

> Here is my +1 to start.
>
> - Setup a small 4 node cluster
> - Verified some basic HDFS commands
> - Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
> distributed and centralized scheduling)
>
> Cheers
> -Arun
>
>
> On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Hey Vinod,
> >
> > I've cleaned up the RC directory as you requested.
> >
> > Cheers
> > -Arun
> >
> > On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> > vinodkv@apache.org> wrote:
> >
> >> Arun / Subru,
> >>
> >> Thanks for the great work!
> >>
> >> Few quick comments
> >>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> >> their signatures and delete everything else? So that it's easy to pick
> up
> >> the important bits for the voters. For e.g, like this
> >> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
> >>  - Can you put the generated CHANGES.html and releasenotes.html instead
> >> of the md files, for quicker perusal?
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> >>
> >> Hi folks,
> >>
> >>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> >> will be the latest stable/production release for Apache Hadoop - it
> >> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> >> fixes new fixed issues since 2.8.2 .
> >>
> >>      More information about the 2.9.0 release plan can be found here:
> >> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9
> >> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9>*
> >>
> >>      New RC is available at:
> >> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>
> >>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> >> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>
> >>      The maven artifacts are available via repository.apache.org at:
> >> *https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> <https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> >*
> >>
> >>      Please try the release and vote; the vote will run for the usual 5
> >> days, ending on 11/10/2017 4pm PST time.
> >>
> >> Thanks,
> >>
> >> Arun/Subru
> >>
> >>
> >>
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Carlo Aldo Curino <ca...@gmail.com>.
+1 from me.

I have:
1) setup a small cluster
2) enabled the reservation system
3) submitted reservation through REST
4) ran a job within reservation
5) did few negative tests (what happens if res system is disabled and you
try to reserve, what happens if a job is submitted with invalid res id,
etc.)

Everything I tested worked as expected.

Cheers,
Carlo

On Nov 6, 2017 7:06 AM, "Arun Suresh" <as...@apache.org> wrote:

> Here is my +1 to start.
>
> - Setup a small 4 node cluster
> - Verified some basic HDFS commands
> - Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
> distributed and centralized scheduling)
>
> Cheers
> -Arun
>
>
> On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Hey Vinod,
> >
> > I've cleaned up the RC directory as you requested.
> >
> > Cheers
> > -Arun
> >
> > On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> > vinodkv@apache.org> wrote:
> >
> >> Arun / Subru,
> >>
> >> Thanks for the great work!
> >>
> >> Few quick comments
> >>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> >> their signatures and delete everything else? So that it's easy to pick
> up
> >> the important bits for the voters. For e.g, like this
> >> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
> >>  - Can you put the generated CHANGES.html and releasenotes.html instead
> >> of the md files, for quicker perusal?
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> >>
> >> Hi folks,
> >>
> >>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> >> will be the latest stable/production release for Apache Hadoop - it
> >> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> >> fixes new fixed issues since 2.8.2 .
> >>
> >>      More information about the 2.9.0 release plan can be found here:
> >> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9
> >> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9>*
> >>
> >>      New RC is available at:
> >> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>
> >>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> >> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>
> >>      The maven artifacts are available via repository.apache.org at:
> >> *https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> <https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> >*
> >>
> >>      Please try the release and vote; the vote will run for the usual 5
> >> days, ending on 11/10/2017 4pm PST time.
> >>
> >> Thanks,
> >>
> >> Arun/Subru
> >>
> >>
> >>
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Carlo Aldo Curino <ca...@gmail.com>.
+1 from me.

I have:
1) setup a small cluster
2) enabled the reservation system
3) submitted reservation through REST
4) ran a job within reservation
5) did few negative tests (what happens if res system is disabled and you
try to reserve, what happens if a job is submitted with invalid res id,
etc.)

Everything I tested worked as expected.

Cheers,
Carlo

On Nov 6, 2017 7:06 AM, "Arun Suresh" <as...@apache.org> wrote:

> Here is my +1 to start.
>
> - Setup a small 4 node cluster
> - Verified some basic HDFS commands
> - Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
> distributed and centralized scheduling)
>
> Cheers
> -Arun
>
>
> On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Hey Vinod,
> >
> > I've cleaned up the RC directory as you requested.
> >
> > Cheers
> > -Arun
> >
> > On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> > vinodkv@apache.org> wrote:
> >
> >> Arun / Subru,
> >>
> >> Thanks for the great work!
> >>
> >> Few quick comments
> >>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> >> their signatures and delete everything else? So that it's easy to pick
> up
> >> the important bits for the voters. For e.g, like this
> >> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
> >>  - Can you put the generated CHANGES.html and releasenotes.html instead
> >> of the md files, for quicker perusal?
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> >>
> >> Hi folks,
> >>
> >>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> >> will be the latest stable/production release for Apache Hadoop - it
> >> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> >> fixes new fixed issues since 2.8.2 .
> >>
> >>      More information about the 2.9.0 release plan can be found here:
> >> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9
> >> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9>*
> >>
> >>      New RC is available at:
> >> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>
> >>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> >> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>
> >>      The maven artifacts are available via repository.apache.org at:
> >> *https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> <https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> >*
> >>
> >>      Please try the release and vote; the vote will run for the usual 5
> >> days, ending on 11/10/2017 4pm PST time.
> >>
> >> Thanks,
> >>
> >> Arun/Subru
> >>
> >>
> >>
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Carlo Aldo Curino <ca...@gmail.com>.
+1 from me.

I have:
1) setup a small cluster
2) enabled the reservation system
3) submitted reservation through REST
4) ran a job within reservation
5) did few negative tests (what happens if res system is disabled and you
try to reserve, what happens if a job is submitted with invalid res id,
etc.)

Everything I tested worked as expected.

Cheers,
Carlo

On Nov 6, 2017 7:06 AM, "Arun Suresh" <as...@apache.org> wrote:

> Here is my +1 to start.
>
> - Setup a small 4 node cluster
> - Verified some basic HDFS commands
> - Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
> distributed and centralized scheduling)
>
> Cheers
> -Arun
>
>
> On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:
>
> > Hey Vinod,
> >
> > I've cleaned up the RC directory as you requested.
> >
> > Cheers
> > -Arun
> >
> > On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> > vinodkv@apache.org> wrote:
> >
> >> Arun / Subru,
> >>
> >> Thanks for the great work!
> >>
> >> Few quick comments
> >>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> >> their signatures and delete everything else? So that it's easy to pick
> up
> >> the important bits for the voters. For e.g, like this
> >> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
> >>  - Can you put the generated CHANGES.html and releasenotes.html instead
> >> of the md files, for quicker perusal?
> >>
> >> Thanks
> >> +Vinod
> >>
> >> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> >>
> >> Hi folks,
> >>
> >>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line
> and
> >> will be the latest stable/production release for Apache Hadoop - it
> >> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> >> fixes new fixed issues since 2.8.2 .
> >>
> >>      More information about the 2.9.0 release plan can be found here:
> >> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9
> >> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
> >> Roadmap-Version2.9>*
> >>
> >>      New RC is available at:
> >> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> >>
> >>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id
> is:
> >> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> >>
> >>      The maven artifacts are available via repository.apache.org at:
> >> *https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> <https://repository.apache.org/content/repositories/
> orgapachehadoop-1065/
> >> >*
> >>
> >>      Please try the release and vote; the vote will run for the usual 5
> >> days, ending on 11/10/2017 4pm PST time.
> >>
> >> Thanks,
> >>
> >> Arun/Subru
> >>
> >>
> >>
> >
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Here is my +1 to start.

- Setup a small 4 node cluster
- Verified some basic HDFS commands
- Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
distributed and centralized scheduling)

Cheers
-Arun


On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:

> Hey Vinod,
>
> I've cleaned up the RC directory as you requested.
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org> wrote:
>
>> Arun / Subru,
>>
>> Thanks for the great work!
>>
>> Few quick comments
>>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
>> their signatures and delete everything else? So that it's easy to pick up
>> the important bits for the voters. For e.g, like this
>> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>>  - Can you put the generated CHANGES.html and releasenotes.html instead
>> of the md files, for quicker perusal?
>>
>> Thanks
>> +Vinod
>>
>> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> Hi folks,
>>
>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
>> will be the latest stable/production release for Apache Hadoop - it
>> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> fixes new fixed issues since 2.8.2 .
>>
>>      More information about the 2.9.0 release plan can be found here:
>> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9
>> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9>*
>>
>>      New RC is available at:
>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>
>>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>
>>      The maven artifacts are available via repository.apache.org at:
>> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> >*
>>
>>      Please try the release and vote; the vote will run for the usual 5
>> days, ending on 11/10/2017 4pm PST time.
>>
>> Thanks,
>>
>> Arun/Subru
>>
>>
>>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Here is my +1 to start.

- Setup a small 4 node cluster
- Verified some basic HDFS commands
- Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
distributed and centralized scheduling)

Cheers
-Arun


On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:

> Hey Vinod,
>
> I've cleaned up the RC directory as you requested.
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org> wrote:
>
>> Arun / Subru,
>>
>> Thanks for the great work!
>>
>> Few quick comments
>>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
>> their signatures and delete everything else? So that it's easy to pick up
>> the important bits for the voters. For e.g, like this
>> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>>  - Can you put the generated CHANGES.html and releasenotes.html instead
>> of the md files, for quicker perusal?
>>
>> Thanks
>> +Vinod
>>
>> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> Hi folks,
>>
>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
>> will be the latest stable/production release for Apache Hadoop - it
>> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> fixes new fixed issues since 2.8.2 .
>>
>>      More information about the 2.9.0 release plan can be found here:
>> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9
>> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9>*
>>
>>      New RC is available at:
>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>
>>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>
>>      The maven artifacts are available via repository.apache.org at:
>> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> >*
>>
>>      Please try the release and vote; the vote will run for the usual 5
>> days, ending on 11/10/2017 4pm PST time.
>>
>> Thanks,
>>
>> Arun/Subru
>>
>>
>>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Here is my +1 to start.

- Setup a small 4 node cluster
- Verified some basic HDFS commands
- Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
distributed and centralized scheduling)

Cheers
-Arun


On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:

> Hey Vinod,
>
> I've cleaned up the RC directory as you requested.
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org> wrote:
>
>> Arun / Subru,
>>
>> Thanks for the great work!
>>
>> Few quick comments
>>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
>> their signatures and delete everything else? So that it's easy to pick up
>> the important bits for the voters. For e.g, like this
>> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>>  - Can you put the generated CHANGES.html and releasenotes.html instead
>> of the md files, for quicker perusal?
>>
>> Thanks
>> +Vinod
>>
>> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> Hi folks,
>>
>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
>> will be the latest stable/production release for Apache Hadoop - it
>> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> fixes new fixed issues since 2.8.2 .
>>
>>      More information about the 2.9.0 release plan can be found here:
>> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9
>> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9>*
>>
>>      New RC is available at:
>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>
>>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>
>>      The maven artifacts are available via repository.apache.org at:
>> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> >*
>>
>>      Please try the release and vote; the vote will run for the usual 5
>> days, ending on 11/10/2017 4pm PST time.
>>
>> Thanks,
>>
>> Arun/Subru
>>
>>
>>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Here is my +1 to start.

- Setup a small 4 node cluster
- Verified some basic HDFS commands
- Ran Pi / sleep jobs (with some mix of Opportunistic containers - both
distributed and centralized scheduling)

Cheers
-Arun


On Fri, Nov 3, 2017 at 4:38 PM, Arun Suresh <as...@apache.org> wrote:

> Hey Vinod,
>
> I've cleaned up the RC directory as you requested.
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <
> vinodkv@apache.org> wrote:
>
>> Arun / Subru,
>>
>> Thanks for the great work!
>>
>> Few quick comments
>>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
>> their signatures and delete everything else? So that it's easy to pick up
>> the important bits for the voters. For e.g, like this
>> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>>  - Can you put the generated CHANGES.html and releasenotes.html instead
>> of the md files, for quicker perusal?
>>
>> Thanks
>> +Vinod
>>
>> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>>
>> Hi folks,
>>
>>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
>> will be the latest stable/production release for Apache Hadoop - it
>> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
>> fixes new fixed issues since 2.8.2 .
>>
>>      More information about the 2.9.0 release plan can be found here:
>> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9
>> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#
>> Roadmap-Version2.9>*
>>
>>      New RC is available at:
>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>
>>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>
>>      The maven artifacts are available via repository.apache.org at:
>> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
>> >*
>>
>>      Please try the release and vote; the vote will run for the usual 5
>> days, ending on 11/10/2017 4pm PST time.
>>
>> Thanks,
>>
>> Arun/Subru
>>
>>
>>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Hey Vinod,

I've cleaned up the RC directory as you requested.

Cheers
-Arun

On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> Arun / Subru,
>
> Thanks for the great work!
>
> Few quick comments
>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> their signatures and delete everything else? So that it's easy to pick up
> the important bits for the voters. For e.g, like this
> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>  - Can you put the generated CHANGES.html and releasenotes.html instead of
> the md files, for quicker perusal?
>
> Thanks
> +Vinod
>
> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>
> Hi folks,
>
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Hey Vinod,

I've cleaned up the RC directory as you requested.

Cheers
-Arun

On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> Arun / Subru,
>
> Thanks for the great work!
>
> Few quick comments
>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> their signatures and delete everything else? So that it's easy to pick up
> the important bits for the voters. For e.g, like this
> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>  - Can you put the generated CHANGES.html and releasenotes.html instead of
> the md files, for quicker perusal?
>
> Thanks
> +Vinod
>
> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>
> Hi folks,
>
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Hey Vinod,

I've cleaned up the RC directory as you requested.

Cheers
-Arun

On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> Arun / Subru,
>
> Thanks for the great work!
>
> Few quick comments
>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> their signatures and delete everything else? So that it's easy to pick up
> the important bits for the voters. For e.g, like this
> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>  - Can you put the generated CHANGES.html and releasenotes.html instead of
> the md files, for quicker perusal?
>
> Thanks
> +Vinod
>
> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>
> Hi folks,
>
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Arun Suresh <as...@apache.org>.
Hey Vinod,

I've cleaned up the RC directory as you requested.

Cheers
-Arun

On Fri, Nov 3, 2017 at 4:09 PM, Vinod Kumar Vavilapalli <vi...@apache.org>
wrote:

> Arun / Subru,
>
> Thanks for the great work!
>
> Few quick comments
>  - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and
> their signatures and delete everything else? So that it's easy to pick up
> the important bits for the voters. For e.g, like this
> http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/
>  - Can you put the generated CHANGES.html and releasenotes.html instead of
> the md files, for quicker perusal?
>
> Thanks
> +Vinod
>
> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
>
> Hi folks,
>
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>
>
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Arun / Subru,

Thanks for the great work!

Few quick comments
 - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and their signatures and delete everything else? So that it's easy to pick up the important bits for the voters. For e.g, like this http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/ <http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/>
 - Can you put the generated CHANGES.html and releasenotes.html instead of the md files, for quicker perusal?

Thanks
+Vinod

> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> 
> Hi folks,
> 
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
> 
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9>*
> 
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> 
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> 
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/>*
> 
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
> 
> Thanks,
> 
> Arun/Subru


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Arun / Subru,

Thanks for the great work!

Few quick comments
 - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and their signatures and delete everything else? So that it's easy to pick up the important bits for the voters. For e.g, like this http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/ <http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/>
 - Can you put the generated CHANGES.html and releasenotes.html instead of the md files, for quicker perusal?

Thanks
+Vinod

> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> 
> Hi folks,
> 
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
> 
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9>*
> 
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> 
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> 
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/>*
> 
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
> 
> Thanks,
> 
> Arun/Subru


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Subru/Arun for the great work!

Downloaded source and built from it. Deployed RM HA non-secured cluster
along with new YARN UI and ATSv2.

I am facing basic RM HA switch issue after first time successful start. *Can
anyone else is facing this issue?*

When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
active successfully. Exception trace I see from the log is

2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
transitioning to Active mode
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
    ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth
    at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
    ... 5 more
Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
    at
org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
    at
org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
    at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    ... 13 more

Thanks & Regards
Rohith Sharma K S

On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:

> Hi folks,
>
>      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>       More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>       New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>       The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>       The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>       Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Arun / Subru,

Thanks for the great work!

Few quick comments
 - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and their signatures and delete everything else? So that it's easy to pick up the important bits for the voters. For e.g, like this http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/ <http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/>
 - Can you put the generated CHANGES.html and releasenotes.html instead of the md files, for quicker perusal?

Thanks
+Vinod

> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> 
> Hi folks,
> 
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
> 
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9>*
> 
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> 
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> 
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/>*
> 
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
> 
> Thanks,
> 
> Arun/Subru


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Arun / Subru,

Thanks for the great work!

Few quick comments
 - Can you cleanup the RC folder to only have tar.gz and src.tar.gz and their signatures and delete everything else? So that it's easy to pick up the important bits for the voters. For e.g, like this http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/ <http://people.apache.org/~vinodkv/hadoop-2.8.1-RC3/>
 - Can you put the generated CHANGES.html and releasenotes.html instead of the md files, for quicker perusal?

Thanks
+Vinod

> On Nov 3, 2017, at 3:50 PM, Arun Suresh <as...@apache.org> wrote:
> 
> Hi folks,
> 
>     Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
> 
>      More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/Roadmap#Roadmap-Version2.9>*
> 
>      New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
> 
>      The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
> 
>      The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/>*
> 
>      Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
> 
> Thanks,
> 
> Arun/Subru


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Subru/Arun for the great work!

Downloaded source and built from it. Deployed RM HA non-secured cluster
along with new YARN UI and ATSv2.

I am facing basic RM HA switch issue after first time successful start. *Can
anyone else is facing this issue?*

When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
active successfully. Exception trace I see from the log is

2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
transitioning to Active mode
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
    ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth
    at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
    ... 5 more
Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
    at
org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
    at
org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
    at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    ... 13 more

Thanks & Regards
Rohith Sharma K S

On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:

> Hi folks,
>
>      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>       More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>       New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>       The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>       The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>       Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>

Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)

Posted by Rohith Sharma K S <ro...@apache.org>.
Thanks Subru/Arun for the great work!

Downloaded source and built from it. Deployed RM HA non-secured cluster
along with new YARN UI and ATSv2.

I am facing basic RM HA switch issue after first time successful start. *Can
anyone else is facing this issue?*

When RM is switched from ACTIVE to STANDBY to ACTIVE, RM never switch to
active successfully. Exception trace I see from the log is

2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to
Active
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:894)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when
transitioning to Active mode
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
    at
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
    ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException:
org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode =
NoAuth
    at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:205)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167)
    at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
    ... 5 more
Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
KeeperErrorCode = NoAuth
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
    at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949)
    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
    at
org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
    at
org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransaction.commit(ZKCuratorManager.java:403)
    at
org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(ZKCuratorManager.java:372)
    at
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
    at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:754)
    at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
    ... 13 more

Thanks & Regards
Rohith Sharma K S

On 4 November 2017 at 04:20, Arun Suresh <as...@apache.org> wrote:

> Hi folks,
>
>      Apache Hadoop 2.9.0 is the first stable release of Hadoop 2.9 line and
> will be the latest stable/production release for Apache Hadoop - it
> includes 30 New Features with 500+ subtasks, 407 Improvements, 787 Bug
> fixes new fixed issues since 2.8.2 .
>
>       More information about the 2.9.0 release plan can be found here:
> *https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9
> <https://cwiki.apache.org/confluence/display/HADOOP/
> Roadmap#Roadmap-Version2.9>*
>
>       New RC is available at:
> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>
>       The RC tag in git is: release-2.9.0-RC0, and the latest commit id is:
> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>
>       The maven artifacts are available via repository.apache.org at:
> *https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> <https://repository.apache.org/content/repositories/orgapachehadoop-1065/
> >*
>
>       Please try the release and vote; the vote will run for the usual 5
> days, ending on 11/10/2017 4pm PST time.
>
> Thanks,
>
> Arun/Subru
>