You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Sebastian Toader <st...@hortonworks.com> on 2016/04/21 17:19:49 UTC

Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/
-----------------------------------------------------------

Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.


Bugs: AMBARI-16013
    https://issues.apache.org/jira/browse/AMBARI-16013


Repository: ambari


Description
-------

When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.

`TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
  ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 

Diff: https://reviews.apache.org/r/46496/diff/


Testing
-------

Manual testign with a 5 node cluster using Blueprints.

Unit tests:
Results :

Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36


Thanks,

Sebastian Toader


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Sandor Magyari <sm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/#review129910
-----------------------------------------------------------


Ship it!




Ship It!

- Sandor Magyari


On April 21, 2016, 3:19 p.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46496/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 3:19 p.m.)
> 
> 
> Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-16013
>     https://issues.apache.org/jira/browse/AMBARI-16013
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.
> 
> `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
>   ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 
> 
> Diff: https://reviews.apache.org/r/46496/diff/
> 
> 
> Testing
> -------
> 
> Manual testign with a 5 node cluster using Blueprints.
> 
> Unit tests:
> Results :
> 
> Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Laszlo Puskas <lp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/#review129906
-----------------------------------------------------------




ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java (line 395)
<https://reviews.apache.org/r/46496/#comment193458>

    Add the host to the log message


- Laszlo Puskas


On April 21, 2016, 3:19 p.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46496/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 3:19 p.m.)
> 
> 
> Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-16013
>     https://issues.apache.org/jira/browse/AMBARI-16013
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.
> 
> `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
>   ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 
> 
> Diff: https://reviews.apache.org/r/46496/diff/
> 
> 
> Testing
> -------
> 
> Manual testign with a 5 node cluster using Blueprints.
> 
> Unit tests:
> Results :
> 
> Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Sid Wagle <sw...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/#review129926
-----------------------------------------------------------


Ship it!




Ship It!

- Sid Wagle


On April 21, 2016, 3:51 p.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46496/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 3:51 p.m.)
> 
> 
> Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-16013
>     https://issues.apache.org/jira/browse/AMBARI-16013
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.
> 
> `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
>   ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 
> 
> Diff: https://reviews.apache.org/r/46496/diff/
> 
> 
> Testing
> -------
> 
> Manual testign with a 5 node cluster using Blueprints.
> 
> Unit tests:
> Results :
> 
> Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Sebastian Toader <st...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/
-----------------------------------------------------------

(Updated April 21, 2016, 5:51 p.m.)


Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.


Changes
-------

Address review comments.


Bugs: AMBARI-16013
    https://issues.apache.org/jira/browse/AMBARI-16013


Repository: ambari


Description
-------

When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.

`TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
  ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 

Diff: https://reviews.apache.org/r/46496/diff/


Testing
-------

Manual testign with a 5 node cluster using Blueprints.

Unit tests:
Results :

Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36


Thanks,

Sebastian Toader


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Daniel Gergely <dg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/#review129905
-----------------------------------------------------------


Fix it, then Ship it!





ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java (line 51)
<https://reviews.apache.org/r/46496/#comment193457>

    I cannot see where HostState is used.



ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java (lines 394 - 397)
<https://reviews.apache.org/r/46496/#comment193456>

    isEmpty check is not necessary here, contains is enough.
    Even the whole if statement is needed only for logging.
    So remove operation can be executed directly if log message is not misleading in the case when availableHosts does not contain host. (is it possible?)


- Daniel Gergely


On ápr. 21, 2016, 3:19 du, Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46496/
> -----------------------------------------------------------
> 
> (Updated ápr. 21, 2016, 3:19 du)
> 
> 
> Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-16013
>     https://issues.apache.org/jira/browse/AMBARI-16013
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.
> 
> `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
>   ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 
> 
> Diff: https://reviews.apache.org/r/46496/diff/
> 
> 
> Testing
> -------
> 
> Manual testign with a 5 node cluster using Blueprints.
> 
> Unit tests:
> Results :
> 
> Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 46496: Host_status stuck in UNKNOWN status after blueprint deploy with host in heartbeat-lost

Posted by Laszlo Puskas <lp...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46496/#review129907
-----------------------------------------------------------


Ship it!




Ship It!

- Laszlo Puskas


On April 21, 2016, 3:19 p.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/46496/
> -----------------------------------------------------------
> 
> (Updated April 21, 2016, 3:19 p.m.)
> 
> 
> Review request for Ambari, Daniel Gergely, Laszlo Puskas, Sandor Magyari, Sumit Mohanty, and Sid Wagle.
> 
> 
> Bugs: AMBARI-16013
>     https://issues.apache.org/jira/browse/AMBARI-16013
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When hosts register to Ambari server the `TopologyManager` adds these to its `availableHosts` collection. When a cluster is provisioned using Blueprints `TopologyManager` tries to allocate required hosts to hostgroups from the available hosts collection. In case hosts turned into HEARTBEAT_LOST state these were not removed from `availableHosts` this resulting scheduling logical tasks to unreachable hosts. When these unreachable hosts become available re-register with Ambari server. The server since already scheduled logical tasks for these it won't try again thus will never create role commands to be executed by the hosts.
> 
> `TopologyManager` has been hooked now to the HEARTBEAT_LOST state transition to remove the host in question from its internal `availableHosts` collection.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/host/HostImpl.java d221112 
>   ambari-server/src/main/java/org/apache/ambari/server/topology/TopologyManager.java 5a0aca0 
> 
> Diff: https://reviews.apache.org/r/46496/diff/
> 
> 
> Testing
> -------
> 
> Manual testign with a 5 node cluster using Blueprints.
> 
> Unit tests:
> Results :
> 
> Tests run: 3561, Failures: 0, Errors: 0, Skipped: 36
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>