You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Alejandro Fernandez <af...@hortonworks.com> on 2015/06/02 05:20:53 UTC
Review Request 34920: Restarting HistoryServer fails during RU because
NameNode is in safemode
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34920/
-----------------------------------------------------------
Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, and Nate Cole.
Bugs: AMBARI-11605
https://issues.apache.org/jira/browse/AMBARI-11605
Repository: ambari
Description
-------
When restarting HistoryServer for the first time during the Core Masters rolling upgrade, the restart fails because one of the NameNodes is still in safemode.
Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted, it's still possible for the standby NameNode to still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before proceeding to any other services or Service Checks.
Diffs
-----
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py 5e824d0
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py 864961e
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py 38270e8
ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py 6e12dd0
ambari-server/src/test/python/stacks/2.0.6/HDFS/test_namenode.py b7126fd
Diff: https://reviews.apache.org/r/34920/diff/
Testing
-------
Deployed a cluster and copied the patched files, then enabled NameNode HA, and performed a successful RU.
----------------------------------------------------------------------
Total run:744
Total errors:0
Total failures:0
OK
Thanks,
Alejandro Fernandez
Re: Review Request 34920: Restarting HistoryServer fails during RU
because NameNode is in safemode
Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34920/#review86249
-----------------------------------------------------------
Ship it!
Ship It!
- Jonathan Hurley
On June 1, 2015, 11:20 p.m., Alejandro Fernandez wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34920/
> -----------------------------------------------------------
>
> (Updated June 1, 2015, 11:20 p.m.)
>
>
> Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, and Nate Cole.
>
>
> Bugs: AMBARI-11605
> https://issues.apache.org/jira/browse/AMBARI-11605
>
>
> Repository: ambari
>
>
> Description
> -------
>
> When restarting HistoryServer for the first time during the Core Masters rolling upgrade, the restart fails because one of the NameNodes is still in safemode.
>
> Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted, it's still possible for the standby NameNode to still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode before proceeding to any other services or Service Checks.
>
>
> Diffs
> -----
>
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py 5e824d0
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py 864961e
> ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py 38270e8
> ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py 6e12dd0
> ambari-server/src/test/python/stacks/2.0.6/HDFS/test_namenode.py b7126fd
>
> Diff: https://reviews.apache.org/r/34920/diff/
>
>
> Testing
> -------
>
> Deployed a cluster and copied the patched files, then enabled NameNode HA, and performed a successful RU.
>
> ----------------------------------------------------------------------
> Total run:744
> Total errors:0
> Total failures:0
> OK
>
>
> Thanks,
>
> Alejandro Fernandez
>
>