You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Sebastian Toader <st...@hortonworks.com> on 2017/01/08 09:39:29 UTC

Review Request 55325: Ambari agents remain in heartbeat lost state after ambari server restart

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/
-----------------------------------------------------------

Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.


Bugs: AMBARI-19416
    https://issues.apache.org/jira/browse/AMBARI-19416


Repository: ambari


Description
-------

Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned


Diffs
-----

  ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e 
  ambari-agent/src/main/python/ambari_agent/main.py f812226 
  ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410 

Diff: https://reviews.apache.org/r/55325/diff/


Testing
-------

Manual testing:
1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
2. Restarting agents

Unit tests:

-----------------------------------------------------------------------
Ran 452 tests in 107.184s


Thanks,

Sebastian Toader


Re: Review Request 55325: Ambari agents remain in heartbeat lost state after ambari server restart

Posted by Sandor Magyari <sm...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160837
-----------------------------------------------------------


Ship it!




Ship It!

- Sandor Magyari


On Jan. 8, 2017, 9:39 a.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2017, 9:39 a.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
> 
> 
> Bugs: AMBARI-19416
>     https://issues.apache.org/jira/browse/AMBARI-19416
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e 
>   ambari-agent/src/main/python/ambari_agent/main.py f812226 
>   ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410 
> 
> Diff: https://reviews.apache.org/r/55325/diff/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
> 
> Unit tests:
> 
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 55325: Ambari agents remain in heartbeat lost state after ambari server restart

Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160870
-----------------------------------------------------------




ambari-agent/src/main/python/ambari_agent/Controller.py (line 472)
<https://reviews.apache.org/r/55325/#comment232121>

    I think the queue should be (re-)created before spawning the executor, not after killing it.  The process may be killed from elsewhere (eg. by the OS), in that case the old queue is kept and the agent may be stuck:
    
    ```
    INFO 2017-01-09 08:10:15,172 main.py:316 - Respawning statusCommandsExecutor
    INFO 2017-01-09 08:10:35,333 main.py:316 - Respawning statusCommandsExecutor
    INFO 2017-01-09 08:10:40,373 main.py:316 - Respawning statusCommandsExecutor
    ...
    09 Jan 2017 08:12:37,364  WARN [ambari-hearbeat-monitor] HeartbeatMonitor:159 - Heartbeat lost from host c6401.ambari.apache.org
    ```


- Attila Doroszlai


On Jan. 8, 2017, 10:39 a.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2017, 10:39 a.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
> 
> 
> Bugs: AMBARI-19416
>     https://issues.apache.org/jira/browse/AMBARI-19416
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e 
>   ambari-agent/src/main/python/ambari_agent/main.py f812226 
>   ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410 
> 
> Diff: https://reviews.apache.org/r/55325/diff/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
> 
> Unit tests:
> 
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 55325: Ambari agents remain in heartbeat lost state after ambari server restart

Posted by Attila Doroszlai <ad...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/#review160882
-----------------------------------------------------------


Ship it!




Ship It!

- Attila Doroszlai


On Jan. 9, 2017, 11:02 a.m., Sebastian Toader wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55325/
> -----------------------------------------------------------
> 
> (Updated Jan. 9, 2017, 11:02 a.m.)
> 
> 
> Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.
> 
> 
> Bugs: AMBARI-19416
>     https://issues.apache.org/jira/browse/AMBARI-19416
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned
> 
> 
> Diffs
> -----
> 
>   ambari-agent/src/main/python/ambari_agent/ActionQueue.py 3726286 
>   ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e 
>   ambari-agent/src/main/python/ambari_agent/main.py f812226 
>   ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py 19fad56 
>   ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410 
> 
> Diff: https://reviews.apache.org/r/55325/diff/
> 
> 
> Testing
> -------
> 
> Manual testing:
> 1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
> 2. Restarting agents
> 
> Unit tests:
> 
> -----------------------------------------------------------------------
> Ran 452 tests in 107.184s
> 
> 
> Thanks,
> 
> Sebastian Toader
> 
>


Re: Review Request 55325: Ambari agents remain in heartbeat lost state after ambari server restart

Posted by Sebastian Toader <st...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55325/
-----------------------------------------------------------

(Updated Jan. 9, 2017, 11:02 a.m.)


Review request for Ambari, Attila Doroszlai, Andrew Onischuk, Myroslav Papirkovskyy, Sandor Magyari, and Sid Wagle.


Changes
-------

Status command queue is created befofre status command executor child process is spawned.


Bugs: AMBARI-19416
    https://issues.apache.org/jira/browse/AMBARI-19416


Repository: ambari


Description
-------

Re-create ```self.actionQueue.statusCommandQueue``` when status command executor child process is re-spawned


Diffs (updated)
-----

  ambari-agent/src/main/python/ambari_agent/ActionQueue.py 3726286 
  ambari-agent/src/main/python/ambari_agent/Controller.py f6bda1e 
  ambari-agent/src/main/python/ambari_agent/main.py f812226 
  ambari-agent/src/test/python/ambari_agent/TestHeartbeat.py 19fad56 
  ambari-agent/src/test/python/ambari_agent/TestMain.py 6f38410 

Diff: https://reviews.apache.org/r/55325/diff/


Testing
-------

Manual testing:
1. Restarting ambari server multiple times and checking that all agents reconnect fine and continues executing status commands
2. Restarting agents

Unit tests:

-----------------------------------------------------------------------
Ran 452 tests in 107.184s


Thanks,

Sebastian Toader