You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ambari.apache.org by Alejandro Fernandez <af...@hortonworks.com> on 2015/06/02 05:20:53 UTC

Review Request 34920: Restarting HistoryServer fails during RU because NameNode is in safemode

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34920/
-----------------------------------------------------------

Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, and Nate Cole.


Bugs: AMBARI-11605
    https://issues.apache.org/jira/browse/AMBARI-11605


Repository: ambari


Description
-------

When restarting HistoryServer for the first time during the Core Masters rolling upgrade, the restart fails because one of the NameNodes is still in safemode.

Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted, it's still possible for the standby NameNode to still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before proceeding to any other services or Service Checks.


Diffs
-----

  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py 5e824d0 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py 864961e 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py 38270e8 
  ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py 6e12dd0 
  ambari-server/src/test/python/stacks/2.0.6/HDFS/test_namenode.py b7126fd 

Diff: https://reviews.apache.org/r/34920/diff/


Testing
-------

Deployed a cluster and copied the patched files, then enabled NameNode HA, and performed a successful RU.

----------------------------------------------------------------------
Total run:744
Total errors:0
Total failures:0
OK


Thanks,

Alejandro Fernandez

Re: Review Request 34920: Restarting HistoryServer fails during RU because NameNode is in safemode

Posted by Jonathan Hurley <jh...@hortonworks.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34920/#review86249
-----------------------------------------------------------

Ship it!


Ship It!

- Jonathan Hurley


On June 1, 2015, 11:20 p.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34920/
> -----------------------------------------------------------
> 
> (Updated June 1, 2015, 11:20 p.m.)
> 
> 
> Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, and Nate Cole.
> 
> 
> Bugs: AMBARI-11605
>     https://issues.apache.org/jira/browse/AMBARI-11605
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When restarting HistoryServer for the first time during the Core Masters rolling upgrade, the restart fails because one of the NameNodes is still in safemode.
> 
> Turns out that now that the HDFS command run faster, by the time the HistorySever is restarted, it's still possible for the standby NameNode to still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode before proceeding to any other services or Service Checks.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py 5e824d0 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py 864961e 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py 38270e8 
>   ambari-server/src/main/resources/common-services/HIVE/0.12.0.2.0/package/scripts/params_linux.py 6e12dd0 
>   ambari-server/src/test/python/stacks/2.0.6/HDFS/test_namenode.py b7126fd 
> 
> Diff: https://reviews.apache.org/r/34920/diff/
> 
> 
> Testing
> -------
> 
> Deployed a cluster and copied the patched files, then enabled NameNode HA, and performed a successful RU.
> 
> ----------------------------------------------------------------------
> Total run:744
> Total errors:0
> Total failures:0
> OK
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>