You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2017/01/10 01:55:52 UTC
Review Request 55364: NodeManager restart fails during HOU if it is
on same host as RM
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/
-----------------------------------------------------------
Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
Bugs: AMBARI-19435
https://issues.apache.org/jira/browse/AMBARI-19435
Repository: ambari
Description
-------
Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
It seems like there are two possible solutions to this:
- Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
- Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
Diffs
-----
ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
Diff: https://reviews.apache.org/r/55364/diff/
Testing
-------
Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
Tests PENDING...
Thanks,
Jonathan Hurley
Re: Review Request 55364: NodeManager restart fails during HOU if it
is on same host as RM
Posted by Robert Levas <rl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161071
-----------------------------------------------------------
Ship it!
Ship It!
- Robert Levas
On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
>
> (Updated Jan. 9, 2017, 8:55 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
>
>
> Bugs: AMBARI-19435
> https://issues.apache.org/jira/browse/AMBARI-19435
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
>
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
>
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
>
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
>
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
> ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
> ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
>
> Diff: https://reviews.apache.org/r/55364/diff/
>
>
> Testing
> -------
>
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
>
> Tests PENDING...
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 55364: NodeManager restart fails during HOU if it
is on same host as RM
Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161061
-----------------------------------------------------------
Ship it!
Ship It!
- Nate Cole
On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
>
> (Updated Jan. 9, 2017, 8:55 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
>
>
> Bugs: AMBARI-19435
> https://issues.apache.org/jira/browse/AMBARI-19435
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
>
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
>
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
>
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
>
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
> ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
> ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
>
> Diff: https://reviews.apache.org/r/55364/diff/
>
>
> Testing
> -------
>
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
>
> Tests PENDING...
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 55364: NodeManager restart fails during HOU if it
is on same host as RM
Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161026
-----------------------------------------------------------
ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java (lines 222 - 224)
<https://reviews.apache.org/r/55364/#comment232296>
RoleGraph is tightly coupled to Stages. Here, we're re-using the same graph-building logic, but instead of creating stages, we're just building up a List of Host/HostRoleCommand mappings
ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java (lines 209 - 246)
<https://reviews.apache.org/r/55364/#comment232298>
This is using RoleGraph to build a List where every element represents a "Stage" (within each stage are the RoleCommands to execute)
The work of actually building Stages and Tasks is somewhere else in the UpgradeResourceProvide. The job of this HostOrderedGrouping is merely to create the correct stage/task wrappers which will be turned into Stage and HostRoleCommand instances.
- Jonathan Hurley
On Jan. 9, 2017, 8:55 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
>
> (Updated Jan. 9, 2017, 8:55 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
>
>
> Bugs: AMBARI-19435
> https://issues.apache.org/jira/browse/AMBARI-19435
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
>
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
>
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
>
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
>
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
> ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
> ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
>
> Diff: https://reviews.apache.org/r/55364/diff/
>
>
> Testing
> -------
>
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
>
> Tests PENDING...
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 55364: NodeManager restart fails during HOU if it
is on same host as RM
Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/#review161074
-----------------------------------------------------------
Ship it!
Ship It!
- Dmitro Lisnichenko
On Jan. 10, 2017, 5:41 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55364/
> -----------------------------------------------------------
>
> (Updated Jan. 10, 2017, 5:41 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
>
>
> Bugs: AMBARI-19435
> https://issues.apache.org/jira/browse/AMBARI-19435
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
>
> However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
>
> It seems like there are two possible solutions to this:
> - Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
>
> - Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
>
> This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
> ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/StageWrapper.java 5ec7ddb
> ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapper.java 5fdf91c
> ambari-server/src/test/java/org/apache/ambari/server/agent/AgentResourceTest.java 674025c
> ambari-server/src/test/java/org/apache/ambari/server/controller/KerberosHelperTest.java 8a70f0c
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/ActiveWidgetLayoutResourceProviderTest.java 4b3782f
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StackUpgradeConfigurationMergeTest.java 27d3d7b
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserAuthorizationResourceProviderTest.java 37c48c3
> ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserResourceProviderTest.java b8e027f
> ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
> ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java ea1f18a
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterEffectiveVersionTest.java 8ba891a
> ambari-server/src/test/resources/stacks/HDP/2.1.1/services/HBASE/metainfo.xml PRE-CREATION
>
> Diff: https://reviews.apache.org/r/55364/diff/
>
>
> Testing
> -------
>
> Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
>
> [INFO] Starting audit...
> Audit done.
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 25:21 min
> [INFO] Finished at: 2017-01-10T09:30:19-05:00
> [INFO] Final Memory: 62M/746M
> [INFO] ------------------------------------------------------------------------
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 55364: NodeManager restart fails during HOU if it
is on same host as RM
Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55364/
-----------------------------------------------------------
(Updated Jan. 10, 2017, 10:41 a.m.)
Review request for Ambari, Dmitro Lisnichenko, Nate Cole, and Robert Levas.
Bugs: AMBARI-19435
https://issues.apache.org/jira/browse/AMBARI-19435
Repository: ambari
Description
-------
Host Ordered Upgrades attempt to start all components on a host in parallel within a single stage. During upgrades, there are post-START conditions which must be met, such as ensuring that the NodeManager has rejoined the ResourceManager before considering the START as successful.
However, when a NM and RM are co-located, this is not possible since the NM is started before the RM.
It seems like there are two possible solutions to this:
- Instead of creating a single START stage for every host, we can create two stages; one for masters and one for non-masters. This, however, leads to the problem with components like ZKFC which is not a master but is required to be started before NameNode.
- Use the RoleCommandOrder to create the correct number of stages per host, grouping as many together as the dependencies will allow.
This review is for the 2nd option - using the RoleCommandOrder to determine how to create stages for a HOU.
Diffs (updated)
-----
ambari-server/src/main/java/org/apache/ambari/server/controller/ControllerModule.java 9c93c60
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java 5191e83
ambari-server/src/main/java/org/apache/ambari/server/stageplanner/RoleGraph.java 404e4ff
ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java c6ae050
ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContext.java 8e7e5de
ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeContextFactory.java PRE-CREATION
ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 46e2f8e
ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/HostOrderGrouping.java 5d723f5
ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/StageWrapper.java 5ec7ddb
ambari-server/src/main/java/org/apache/ambari/server/state/stack/upgrade/TaskWrapper.java 5fdf91c
ambari-server/src/test/java/org/apache/ambari/server/agent/AgentResourceTest.java 674025c
ambari-server/src/test/java/org/apache/ambari/server/controller/KerberosHelperTest.java 8a70f0c
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/ActiveWidgetLayoutResourceProviderTest.java 4b3782f
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/StackUpgradeConfigurationMergeTest.java 27d3d7b
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserAuthorizationResourceProviderTest.java 37c48c3
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UserResourceProviderTest.java b8e027f
ambari-server/src/test/java/org/apache/ambari/server/metadata/RoleGraphTest.java 53686aa
ambari-server/src/test/java/org/apache/ambari/server/state/UpgradeHelperTest.java ea1f18a
ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterEffectiveVersionTest.java 8ba891a
ambari-server/src/test/resources/stacks/HDP/2.1.1/services/HBASE/metainfo.xml PRE-CREATION
Diff: https://reviews.apache.org/r/55364/diff/
Testing (updated)
-------
Successfully completed a HOU where dependant components, such as RM and NM, were co-located.
[INFO] Starting audit...
Audit done.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 25:21 min
[INFO] Finished at: 2017-01-10T09:30:19-05:00
[INFO] Final Memory: 62M/746M
[INFO] ------------------------------------------------------------------------
Thanks,
Jonathan Hurley