You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Sebastian Toader <st...@hortonworks.com> on 2017/01/13 10:50:04 UTC

Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/
-----------------------------------------------------------

Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor Magyari.


Bugs: AMBARI-19520
    https://issues.apache.org/jira/browse/AMBARI-19520


Repository: ambari


Description
-------

Problem:

In case ambari server is restarted after restart will ask agents to re-register with the server.
Once the agent successfully re-registered with the server it should be transition out from heartbeat lost state. However in some cases it takes a while until agents transition out from heartbeat lost state thus the server may request the agent to re-register again.

Solution:
Ensure upon agent re-regsitration that StatusCommandExecutor child process is spawned before status commands received from the server (in the response to the registration) are added to the status command queue.


Diffs
-----

  ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
  ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 

Diff: https://reviews.apache.org/r/55494/diff/


Testing
-------

Manually tested covering:
1. Restart agent
2. Restart amabari-server with agents being up and running
3. Kill StatusCommandExecutor child process

Unit test results:

Total run:1158
Total errors:0
Total failures:0
OK

Ran 452 tests in 20.976s

OK

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Ambari Main ....................................... SUCCESS [10.142s]
[INFO] Apache Ambari Project POM ......................... SUCCESS [0.029s]
[INFO] Ambari Views ...................................... SUCCESS [1.707s]
[INFO] utility ........................................... SUCCESS [1.189s]
[INFO] ambari-metrics .................................... SUCCESS [0.473s]
[INFO] Ambari Metrics Common ............................. SUCCESS [1.012s]
[INFO] Ambari Server ..................................... SUCCESS [1:45.492s]
[INFO] Ambari Agent ...................................... SUCCESS [25.860s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS


Thanks,

Sebastian Toader


Re: Review Request 55494: Ambari agents not recovering from heart beat lost state immediately after successful re-registering with server

Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/#review161502
-----------------------------------------------------------


Ship it!




Ship It!

- Attila Doroszlai


On Jan. 13, 2017, 11:50 a.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55494/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2017, 11:50 a.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor Magyari.
> 
> 
> Bugs: AMBARI-19520
>     https://issues.apache.org/jira/browse/AMBARI-19520
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Problem:
> 
> In case ambari server is restarted after restart will ask agents to re-register with the server.
> Once the agent successfully re-registered with the server it should be transition out from heartbeat lost state. However in some cases it takes a while until agents transition out from heartbeat lost state thus the server may request the agent to re-register again.
> 
> Solution:
> Ensure upon agent re-regsitration that StatusCommandExecutor child process is spawned before status commands received from the server (in the response to the registration) are added to the status command queue.
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196 
>   ambari-agent/src/main/python/ambari_agent/main.py 2e0517b 
> 
> Diff: https://reviews.apache.org/r/55494/diff/
> 
> 
> Testing
> -------
> 
> Manually tested covering:
> 1. Restart agent
> 2. Restart amabari-server with agents being up and running
> 3. Kill StatusCommandExecutor child process
> 
> Unit test results:
> 
> Total run:1158
> Total errors:0
> Total failures:0
> OK
> 
> Ran 452 tests in 20.976s
> 
> OK
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Ambari Main ....................................... SUCCESS [10.142s]
> [INFO] Apache Ambari Project POM ......................... SUCCESS [0.029s]
> [INFO] Ambari Views ...................................... SUCCESS [1.707s]
> [INFO] utility ........................................... SUCCESS [1.189s]
> [INFO] ambari-metrics .................................... SUCCESS [0.473s]
> [INFO] Ambari Metrics Common ............................. SUCCESS [1.012s]
> [INFO] Ambari Server ..................................... SUCCESS [1:45.492s]
> [INFO] Ambari Agent ...................................... SUCCESS [25.860s]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>