You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2017/01/10 01:55:52 UTC

Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/
-----------------------------------------------------------

Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.


Bugs: AMBARI-19435
    https://issues.apache.org/jira/browse/AMBARI-19435


Repository: ambari


Description
-------

Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.

However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 

It seems like there are two possible solutions to this:
- Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.

- Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.

This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
  ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
  ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
  ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
  ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
  ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
  ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
  ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 

Diff: https://reviews.apache.org/r/55364/diff/


Testing
-------

Successfully completed a HOU where dependant components, such as RM and NM, were co-located.

Tests PENDING...


Thanks,

Jonathan Hurley


Re: Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

Posted by Robert Levas <rl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161071
-----------------------------------------------------------


Ship it!




Ship It!

- Robert Levas


On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
> 
> (Updated Jan. 9, 2017, 8:55 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-19435
>     https://issues.apache.org/jira/browse/AMBARI-19435
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
> 
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 
> 
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
> 
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
> 
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
>   ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
>   ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
>   ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 
> 
> Diff: https://reviews.apache.org/r/55364/diff/
> 
> 
> Testing
> -------
> 
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
> 
> Tests PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161061
-----------------------------------------------------------


Ship it!




Ship It!

- Nate Cole


On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
> 
> (Updated Jan. 9, 2017, 8:55 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-19435
>     https://issues.apache.org/jira/browse/AMBARI-19435
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
> 
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 
> 
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
> 
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
> 
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
>   ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
>   ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
>   ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 
> 
> Diff: https://reviews.apache.org/r/55364/diff/
> 
> 
> Testing
> -------
> 
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
> 
> Tests PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161026
-----------------------------------------------------------




ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java (lines 222 - 224)
<https://reviews.apache.org/r/55364/#comment232296>

    RoleGraph is tightly coupled to Stages. Here, we're re-using the same graph-building logic, but instead of creating stages, we're just building up a List of Host/HostRoleCommand mappings



ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java (lines 209 - 246)
<https://reviews.apache.org/r/55364/#comment232298>

    This is using RoleGraph to build a List where every element represents a "Stage" (within each stage are the RoleCommands to execute)
    
    The work of actually building Stages and Tasks is somewhere else in the UpgradeResourceProvide. The job of this HostOrderedGrouping is merely to create the correct stage/task wrappers which will be turned into Stage and HostRoleCommand instances.


- Jonathan Hurley


On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
> 
> (Updated Jan. 9, 2017, 8:55 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-19435
>     https://issues.apache.org/jira/browse/AMBARI-19435
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
> 
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 
> 
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
> 
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
> 
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
>   ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
>   ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
>   ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 
> 
> Diff: https://reviews.apache.org/r/55364/diff/
> 
> 
> Testing
> -------
> 
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
> 
> Tests PENDING...
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161074
-----------------------------------------------------------


Ship it!




Ship It!

- Dmitro Lisnichenko


On Jan. 10, 2017, 5:41 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
> 
> (Updated Jan. 10, 2017, 5:41 p.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
> 
> 
> Bugs: AMBARI-19435
>     https://issues.apache.org/jira/browse/AMBARI-19435
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
> 
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 
> 
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
> 
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
> 
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
>   ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
>   ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
>   ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/StageWrapper.java 5ec7ddb 
>   ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapper.java 5fdf91c 
>   ambari-server/src/test/java/org/apache/ambari/server/agent/AgentResourceTest.java 674025c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/KerberosHelperTest.java 8a70f0c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/ActiveWidgetLayoutResourceProviderTest.java 4b3782f 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StackUpgradeConfigurationMergeTest.java 27d3d7b 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserAuthorizationResourceProviderTest.java 37c48c3 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserResourceProviderTest.java b8e027f 
>   ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 
>   ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java ea1f18a 
>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterEffectiveVersionTest.java 8ba891a 
>   ambari-server/src/test/resources/stacks/HDP/2.1.1/services/HBASE/metainfo.xml PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/55364/diff/
> 
> 
> Testing
> -------
> 
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
> 
> [INFO] Starting audit...
> Audit done.
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 25:21 min
> [INFO] Finished at: 2017-01-10T09:30:19-05:00
> [INFO] Final Memory: 62M/746M
> [INFO] ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 55364: NodeManager restart fails during HOU if it is on same host as RM

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/
-----------------------------------------------------------

(Updated Jan. 10, 2017, 10:41 a.m.)


Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.


Bugs: AMBARI-19435
    https://issues.apache.org/jira/browse/AMBARI-19435


Repository: ambari


Description
-------

Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.

However, when a NM and RM are co-located, this is not possible since the NM is started before the RM. 

It seems like there are two possible solutions to this:
- Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.

- Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.

This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83 
  ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff 
  ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050 
  ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de 
  ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION 
  ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e 
  ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5 
  ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/StageWrapper.java 5ec7ddb 
  ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapper.java 5fdf91c 
  ambari-server/src/test/java/org/apache/ambari/server/agent/AgentResourceTest.java 674025c 
  ambari-server/src/test/java/org/apache/ambari/server/controller/KerberosHelperTest.java 8a70f0c 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/ActiveWidgetLayoutResourceProviderTest.java 4b3782f 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StackUpgradeConfigurationMergeTest.java 27d3d7b 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserAuthorizationResourceProviderTest.java 37c48c3 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserResourceProviderTest.java b8e027f 
  ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa 
  ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java ea1f18a 
  ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterEffectiveVersionTest.java 8ba891a 
  ambari-server/src/test/resources/stacks/HDP/2.1.1/services/HBASE/metainfo.xml PRE-CREATION 

Diff: https://reviews.apache.org/r/55364/diff/


Testing (updated)
-------

Successfully completed a HOU where dependant components, such as RM and NM, were co-located.

[INFO] Starting audit...
Audit done.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 25:21 min
[INFO] Finished at: 2017-01-10T09:30:19-05:00
[INFO] Final Memory: 62M/746M
[INFO] ------------------------------------------------------------------------


Thanks,

Jonathan Hurley