You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Sebastian Toader <st...@hortonworks.com> on 2017/01/08 09:39:29 UTC
Review Request 55325: Ambari agents remain in heartbeat lost state
after ambari server restart
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/
-----------------------------------------------------------
Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
Bugs: AMBARI-19416
https://issues.apache.org/jira/browse/AMBARI-19416
Repository: ambari
Description
-------
Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
Diffs
-----
ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e
ambari-agent/src/main/python/ambari_agent/main.py f812226
ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410
Diff: https://reviews.apache.org/r/55325/diff/
Testing
-------
Manual testing:
1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
2. Restarting agents
Unit tests:
-----------------------------------------------------------------------
Ran 452 tests in 107.184s
Thanks,
Sebastian Toader
Re: Review Request 55325: Ambari agents remain in heartbeat lost
state after ambari server restart
Posted by Sandor Magyari <sm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160837
-----------------------------------------------------------
Ship it!
Ship It!
- Sandor Magyari
On Jan. 8, 2017, 9:39 a.m., Sebastian Toader wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
>
> (Updated Jan. 8, 2017, 9:39 a.m.)
>
>
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
>
>
> Bugs: AMBARI-19416
> https://issues.apache.org/jira/browse/AMBARI-19416
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
>
>
> Diffs
> -----
>
> ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e
> ambari-agent/src/main/python/ambari_agent/main.py f812226
> ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410
>
> Diff: https://reviews.apache.org/r/55325/diff/
>
>
> Testing
> -------
>
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
>
> Unit tests:
>
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
>
>
> Thanks,
>
> Sebastian Toader
>
>
Re: Review Request 55325: Ambari agents remain in heartbeat lost
state after ambari server restart
Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160870
-----------------------------------------------------------
ambari-agent/src/main/python/ambari_agent/Controller.py (line 472)
<https://reviews.apache.org/r/55325/#comment232121>
I think the queue should be (re-)created before spawning the executor, not after killing it. The process may be killed from elsewhere (eg. by the OS), in that case the old queue is kept and the agent may be stuck:
```
INFO 2017-01-09 08:10:15,172 main.py:316 - Respawning statusCommandsExecutor
INFO 2017-01-09 08:10:35,333 main.py:316 - Respawning statusCommandsExecutor
INFO 2017-01-09 08:10:40,373 main.py:316 - Respawning statusCommandsExecutor
...
09 Jan 2017 08:12:37,364 WARN [ambari-hearbeat-monitor] HeartbeatMonitor:159 - Heartbeat lost from host c6401.ambari.apache.org
```
- Attila Doroszlai
On Jan. 8, 2017, 10:39 a.m., Sebastian Toader wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
>
> (Updated Jan. 8, 2017, 10:39 a.m.)
>
>
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
>
>
> Bugs: AMBARI-19416
> https://issues.apache.org/jira/browse/AMBARI-19416
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
>
>
> Diffs
> -----
>
> ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e
> ambari-agent/src/main/python/ambari_agent/main.py f812226
> ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410
>
> Diff: https://reviews.apache.org/r/55325/diff/
>
>
> Testing
> -------
>
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
>
> Unit tests:
>
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
>
>
> Thanks,
>
> Sebastian Toader
>
>
Re: Review Request 55325: Ambari agents remain in heartbeat lost
state after ambari server restart
Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160882
-----------------------------------------------------------
Ship it!
Ship It!
- Attila Doroszlai
On Jan. 9, 2017, 11:02 a.m., Sebastian Toader wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
>
> (Updated Jan. 9, 2017, 11:02 a.m.)
>
>
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
>
>
> Bugs: AMBARI-19416
> https://issues.apache.org/jira/browse/AMBARI-19416
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
>
>
> Diffs
> -----
>
> ambari-agent/src/main/python/ambari_agent/ActionQueue.py 3726286
> ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e
> ambari-agent/src/main/python/ambari_agent/main.py f812226
> ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py 19fad56
> ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410
>
> Diff: https://reviews.apache.org/r/55325/diff/
>
>
> Testing
> -------
>
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
>
> Unit tests:
>
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
>
>
> Thanks,
>
> Sebastian Toader
>
>
Re: Review Request 55325: Ambari agents remain in heartbeat lost
state after ambari server restart
Posted by Sebastian Toader <st...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/
-----------------------------------------------------------
(Updated Jan. 9, 2017, 11:02 a.m.)
Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
Changes
-------
Status command queue is created befofre status command executor child process is spawned.
Bugs: AMBARI-19416
https://issues.apache.org/jira/browse/AMBARI-19416
Repository: ambari
Description
-------
Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
Diffs (updated)
-----
ambari-agent/src/main/python/ambari_agent/ActionQueue.py 3726286
ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e
ambari-agent/src/main/python/ambari_agent/main.py f812226
ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py 19fad56
ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410
Diff: https://reviews.apache.org/r/55325/diff/
Testing
-------
Manual testing:
1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
2. Restarting agents
Unit tests:
-----------------------------------------------------------------------
Ran 452 tests in 107.184s
Thanks,
Sebastian Toader