You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Zameer Manji <zm...@twopensource.com> on 2014/09/25 23:39:47 UTC

Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Sept. 25, 2014, 2:39 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Brian's Feedback


Summary (updated)
-----------------

Prevent initial ZK timeouts from killing the executor.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description (updated)
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Bill Farner <wf...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54885
-----------------------------------------------------------

Ship it!



src/main/python/apache/aurora/executor/aurora_executor.py
<https://reviews.apache.org/r/25974/#comment95208>

    TASK_FAILED in the message is redundant.


- Bill Farner


On Sept. 26, 2014, 9:24 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 26, 2014, 9:24 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54895
-----------------------------------------------------------



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95228>

    Do you need input sanitization here? Could this logic be hidden behind an abstraction? It's unclear to me why the serverset connection code needs live with the executor data parsing code.


- Kevin Sweeney


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54890
-----------------------------------------------------------


Pinging Brian and Kevin to this review.

- Zameer Manji


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.

> On Sept. 29, 2014, 4:43 p.m., Brian Wickman wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, lines 112-116
> > <https://reviews.apache.org/r/25974/diff/4/?file=708538#file708538line112>
> >
> >     this data is available within the mesos_task struct created in from_assigned_task.  you should be getting the information from there as it may contain default values not required to be populated in the executorConfig json (in which case you'd be getting KeyErrors from make_serverset.)

done.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54916
-----------------------------------------------------------


On Sept. 29, 2014, 5:54 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2014, 5:54 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54916
-----------------------------------------------------------



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95251>

    this data is available within the mesos_task struct created in from_assigned_task.  you should be getting the information from there as it may contain default values not required to be populated in the executorConfig json (in which case you'd be getting KeyErrors from make_serverset.)



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95257>

    i feel like we should be calling start_async instead, though this complicates implementation slightly.
    
    we want the creation of all the status managers to be non-blocking since the executor will be in a transient (TASK_STARTING) state, which can possibly timeout and go to TASK_LOST, which is less preferable than TASK_FAILED.
    
    client.create_async returns a threading.Event.  we'd want (in a separate thread) to event.wait(timeout=timeout) and if not event.is_set(), set the .healthy property on the Announcer to StatusResult(..., mesos_pb2.TASK_FAILED)
    
    alternately, you could ignore the threading.Event from create_async and bake that logic into the contract between Announcer<->ServerSetJoinThread.  i'm not sure which is simpler or cleaner.


- Brian Wickman


On Sept. 29, 2014, 9:46 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2014, 9:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54892
-----------------------------------------------------------



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95226>

    Some context for the error would be helpful here - e.g. log.error("Initial connection to ZooKeeper timed out: %s" % e).


- Kevin Sweeney


On Sept. 29, 2014, 2:46 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 29, 2014, 2:46 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.

> On Oct. 1, 2014, 4:34 p.m., Kevin Sweeney wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, line 228
> > <https://reviews.apache.org/r/25974/diff/6/?file=709964#file709964line228>
> >
> >     push this into `__start`, out of the constructor?
> >     
> >     At least on the Java side we try to avoid doing any I/O in constructors (as they're more for wiring) but instead delegate to explicit lifecycle methods like `start`.
> 
> Zameer Manji wrote:
>     I'd rather not. Keeping it in the constructor means it stays in the from_assigned_task method code path. Moving it to the start method moves it out of that code path and really changes when we do this sort of I/O.

is it common practice to do I/O in constructors elsewhere in this codebase? I'll defer to wickman here


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
-----------------------------------------------------------


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.

> On Oct. 1, 2014, 4:34 p.m., Kevin Sweeney wrote:
> > src/main/python/apache/aurora/executor/common/announcer.py, line 228
> > <https://reviews.apache.org/r/25974/diff/6/?file=709964#file709964line228>
> >
> >     push this into `__start`, out of the constructor?
> >     
> >     At least on the Java side we try to avoid doing any I/O in constructors (as they're more for wiring) but instead delegate to explicit lifecycle methods like `start`.

I'd rather not. Keeping it in the constructor means it stays in the from_assigned_task method code path. Moving it to the start method moves it out of that code path and really changes when we do this sort of I/O.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
-----------------------------------------------------------


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55161
-----------------------------------------------------------



src/main/python/apache/aurora/executor/aurora_executor.py
<https://reviews.apache.org/r/25974/#comment95545>

    superfluous return



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95546>

    push this into `__start`, out of the constructor?
    
    At least on the Java side we try to avoid doing any I/O in constructors (as they're more for wiring) but instead delegate to explicit lifecycle methods like `start`.


- Kevin Sweeney


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55309
-----------------------------------------------------------



src/test/python/apache/aurora/executor/common/test_announcer.py
<https://reviews.apache.org/r/25974/#comment95688>

    I'm confused -- nobody calls client.connected anymore, right?  In theory, this test will pass, except it will take 30-60 seconds to run since client_connect_event.wait(timeout=...) will be called with the health check timeout.


- Brian Wickman


On Oct. 2, 2014, 11:20 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2014, 11:20 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Oct. 6, 2014, 11:53 a.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Fix import order and checkstyle issues.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55402
-----------------------------------------------------------


Pinging Kevin for final ship it and commit.

- Zameer Manji


On Oct. 2, 2014, 6:44 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2014, 6:44 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55547
-----------------------------------------------------------

Ship it!


Ship It!

- Kevin Sweeney


On Oct. 2, 2014, 6:44 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2014, 6:44 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55313
-----------------------------------------------------------

Ship it!


Ship It!

- Brian Wickman


On Oct. 3, 2014, 1:44 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 1:44 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Oct. 2, 2014, 6:44 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Brian's feedback.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Oct. 2, 2014, 4:20 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Brian's feedback.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Brian Wickman <wi...@apache.org>.

> On Oct. 1, 2014, 11:14 p.m., Brian Wickman wrote:
> >

btw overall this review looks great and is much cleaner than what i was envisioning.


- Brian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
-----------------------------------------------------------


On Oct. 1, 2014, 12:17 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 1, 2014, 12:17 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.

> On Oct. 1, 2014, 4:14 p.m., Brian Wickman wrote:
> > src/test/python/apache/aurora/executor/common/test_announcer.py, line 239
> > <https://reviews.apache.org/r/25974/diff/6/?file=709965#file709965line239>
> >
> >     out of scope for this review, but there's a new-ish pypi project called 'zake' that allows the kazoo client to be stubbed out with a mock zk tree (while preserving certain operations like create/delete.)  i'd love to explore using it here in the future.

It looks really nice for this sort of testing. It is definately out of the scope of this review. If we ever need to do more work on the Announcer/Service Discovery we should first migrate to zake.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
-----------------------------------------------------------


On Sept. 30, 2014, 5:17 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 5:17 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review55158
-----------------------------------------------------------



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment95541>

    i think "if self.__connect_event.is_set()" is more accurate -- it's possible for it to be connected and then race with disconnection.  we just want to validate that it managed to connect initially.
    
    this complicates your test a little bit since you do connected.set() right off the bat.  but you might be able to get around this by making a HealthCheckConfig that produces a tiny timeout.



src/test/python/apache/aurora/executor/common/test_announcer.py
<https://reviews.apache.org/r/25974/#comment95542>

    out of scope for this review, but there's a new-ish pypi project called 'zake' that allows the kazoo client to be stubbed out with a mock zk tree (while preserving certain operations like create/delete.)  i'd love to explore using it here in the future.


- Brian Wickman


On Oct. 1, 2014, 12:17 a.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Oct. 1, 2014, 12:17 a.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Sept. 30, 2014, 5:17 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Brian's Feedback.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing (updated)
-------

./pants src/test/python/apache/aurora/executor:executor-small
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Sept. 29, 2014, 5:54 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Brian's Feedback except for calling start_async()


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Sept. 29, 2014, 2:46 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Bill's Feedback


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/
-----------------------------------------------------------

(Updated Sept. 26, 2014, 2:24 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
-------

Kevin's Feedback.


Bugs: AURORA-728
    https://issues.apache.org/jira/browse/AURORA-728


Repository: aurora


Description
-------

Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
  src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
  src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 

Diff: https://reviews.apache.org/r/25974/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:executor-small


Thanks,

Zameer Manji


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Zameer Manji <zm...@twopensource.com>.

> On Sept. 25, 2014, 4:29 p.m., Kevin Sweeney wrote:
> > src/test/python/apache/aurora/executor/common/test_announcer.py, line 264
> > <https://reviews.apache.org/r/25974/diff/2/?file=705691#file705691line264>
> >
> >     where is this magic number coming from? extract a constant and import that here?

This is the default initial health check + (max failures * health check). I'm unsure on the best way to fix this without duplicating the code it is trying to test.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54623
-----------------------------------------------------------


On Sept. 25, 2014, 2:39 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 25, 2014, 2:39 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Re: Review Request 25974: Prevent initial ZK timeouts from killing the executor.

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25974/#review54623
-----------------------------------------------------------



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment94848>

    add a trailing comma here for easy import-sorting



src/main/python/apache/aurora/executor/common/announcer.py
<https://reviews.apache.org/r/25974/#comment94854>

    log this?



src/test/python/apache/aurora/executor/common/test_announcer.py
<https://reviews.apache.org/r/25974/#comment94851>

    where is this magic number coming from? extract a constant and import that here?


- Kevin Sweeney


On Sept. 25, 2014, 2:39 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25974/
> -----------------------------------------------------------
> 
> (Updated Sept. 25, 2014, 2:39 p.m.)
> 
> 
> Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
> 
> 
> Bugs: AURORA-728
>     https://issues.apache.org/jira/browse/AURORA-728
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Prevent initial ZK timeouts from killing the executor. In addition, prevent uncaught exceptions from killing the executor.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 79a24855b2a68271b7478395dfdadab8755c3af2 
>   src/main/python/apache/aurora/executor/common/announcer.py c466da8d48bbc2aa227c2db157cab84665ad6602 
>   src/test/python/apache/aurora/executor/common/test_announcer.py 4f6e200ecb1a4ea7cb45acd466a57f19d5815326 
> 
> Diff: https://reviews.apache.org/r/25974/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:executor-small
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>