You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Sebastian Toader <st...@hortonworks.com> on 2017/01/13 10:50:04 UTC
Review Request 55494: Ambari agents not recovering from heart beat
lost state immediately after successful re-registering with server
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/
-----------------------------------------------------------
Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor Magyari.
Bugs: AMBARI-19520
https://issues.apache.org/jira/browse/AMBARI-19520
Repository: ambari
Description
-------
Problem:
In case ambari server is restarted after restart will ask agents to re-register with the server.
Once the agent successfully re-registered with the server it should be transition out from heartbeat lost state. However in some cases it takes a while until agents transition out from heartbeat lost state thus the server may request the agent to re-register again.
Solution:
Ensure upon agent re-regsitration that StatusCommandExecutor child process is spawned before status commands received from the server (in the response to the registration) are added to the status command queue.
Diffs
-----
ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196
ambari-agent/src/main/python/ambari_agent/main.py 2e0517b
Diff: https://reviews.apache.org/r/55494/diff/
Testing
-------
Manually tested covering:
1. Restart agent
2. Restart amabari-server with agents being up and running
3. Kill StatusCommandExecutor child process
Unit test results:
Total run:1158
Total errors:0
Total failures:0
OK
Ran 452 tests in 20.976s
OK
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Ambari Main ....................................... SUCCESS [10.142s]
[INFO] Apache Ambari Project POM ......................... SUCCESS [0.029s]
[INFO] Ambari Views ...................................... SUCCESS [1.707s]
[INFO] utility ........................................... SUCCESS [1.189s]
[INFO] ambari-metrics .................................... SUCCESS [0.473s]
[INFO] Ambari Metrics Common ............................. SUCCESS [1.012s]
[INFO] Ambari Server ..................................... SUCCESS [1:45.492s]
[INFO] Ambari Agent ...................................... SUCCESS [25.860s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
Thanks,
Sebastian Toader
Re: Review Request 55494: Ambari agents not recovering from heart
beat lost
state immediately after successful re-registering with server
Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55494/#review161502
-----------------------------------------------------------
Ship it!
Ship It!
- Attila Doroszlai
On Jan. 13, 2017, 11:50 a.m., Sebastian Toader wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55494/
> -----------------------------------------------------------
>
> (Updated Jan. 13, 2017, 11:50 a.m.)
>
>
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, and Sandor Magyari.
>
>
> Bugs: AMBARI-19520
> https://issues.apache.org/jira/browse/AMBARI-19520
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Problem:
>
> In case ambari server is restarted after restart will ask agents to re-register with the server.
> Once the agent successfully re-registered with the server it should be transition out from heartbeat lost state. However in some cases it takes a while until agents transition out from heartbeat lost state thus the server may request the agent to re-register again.
>
> Solution:
> Ensure upon agent re-regsitration that StatusCommandExecutor child process is spawned before status commands received from the server (in the response to the registration) are added to the status command queue.
>
>
> Diffs
> -----
>
> ambari-agent/src/main/python/ambari_agent/Controller.py 6b1b196
> ambari-agent/src/main/python/ambari_agent/main.py 2e0517b
>
> Diff: https://reviews.apache.org/r/55494/diff/
>
>
> Testing
> -------
>
> Manually tested covering:
> 1. Restart agent
> 2. Restart amabari-server with agents being up and running
> 3. Kill StatusCommandExecutor child process
>
> Unit test results:
>
> Total run:1158
> Total errors:0
> Total failures:0
> OK
>
> Ran 452 tests in 20.976s
>
> OK
>
> [INFO] ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Ambari Main ....................................... SUCCESS [10.142s]
> [INFO] Apache Ambari Project POM ......................... SUCCESS [0.029s]
> [INFO] Ambari Views ...................................... SUCCESS [1.707s]
> [INFO] utility ........................................... SUCCESS [1.189s]
> [INFO] ambari-metrics .................................... SUCCESS [0.473s]
> [INFO] Ambari Metrics Common ............................. SUCCESS [1.012s]
> [INFO] Ambari Server ..................................... SUCCESS [1:45.492s]
> [INFO] Ambari Agent ...................................... SUCCESS [25.860s]
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
>
>
> Thanks,
>
> Sebastian Toader
>
>