You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Kapil Arya <ka...@mesosphere.io> on 2015/06/23 02:53:32 UTC

Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/
-----------------------------------------------------------

Review request for mesos and Niklas Nielsen.


Bugs: MESOS-2226
    https://issues.apache.org/jira/browse/MESOS-2226


Repository: mesos


Description
-------

Previously, the code was not checking for TASK_RUNNING status message
from the MockExecutor before stopping the scheduler driver. This caused
the Executor to be terminated prematurely (before the tasks were
launched) and thus the remove-executor hook was never called. The fix
was to wait for the TASK_RUNNING status update and then wait for the
shutdown() within MockExecutor. Only then we wait for the future from
remove-executor hook.


Diffs
-----

  src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 

Diff: https://reviews.apache.org/r/35756/diff/


Testing
-------

make check.


Thanks,

Kapil Arya


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review88900
-----------------------------------------------------------


Patch looks great!

Reviews applied: [35756]

All tests passed.

- Mesos ReviewBot


On June 23, 2015, 12:53 a.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 12:53 a.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Kapil Arya <ka...@mesosphere.io>.

> On June 23, 2015, 4:31 p.m., Niklas Nielsen wrote:
> > Right now, the executor removed hook will be called implicitly during shutdown, which (most likely) caused the race in the first place.
> > Can we split the two concerns by issueing a sched.killTask().

The sched.killTask() doesn't invoke executor shutdown, not for the MockExecutor anyways. The reason being that it's upto the executor to decide whether it wants to hang around after the last task has exited or if it wants to quit.

The current mechanism guarantees that the executor shutdown will be called for the MockExecutor and thus the remove-executor hook will be executed by the Slave.


- Kapil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89046
-----------------------------------------------------------


On June 23, 2015, 5:47 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 5:47 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Kapil Arya <ka...@mesosphere.io>.

> On June 23, 2015, 4:31 p.m., Niklas Nielsen wrote:
> > src/tests/hook_tests.cpp, lines 310-314
> > <https://reviews.apache.org/r/35756/diff/1/?file=990024#file990024line310>
> >
> >     Can you help me understand why you need to synchronize over this event? _Not_ shutting down the slaves before awaiting the executor removed hook seemed to be the race?

Fixed. Also removed the synchronized event for executor registration.


- Kapil


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89046
-----------------------------------------------------------


On June 23, 2015, 5:47 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 5:47 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89046
-----------------------------------------------------------


Right now, the executor removed hook will be called implicitly during shutdown, which (most likely) caused the race in the first place.
Can we split the two concerns by issueing a sched.killTask().


src/tests/hook_tests.cpp (lines 308 - 312)
<https://reviews.apache.org/r/35756/#comment141658>

    Can you help me understand why you need to synchronize over this event? _Not_ shutting down the slaves before awaiting the executor removed hook seemed to be the race?


- Niklas Nielsen


On June 22, 2015, 5:53 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 22, 2015, 5:53 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Adam B <ad...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89135
-----------------------------------------------------------

Ship it!


Ship It!

- Adam B


On June 23, 2015, 2:47 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 2:47 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Niklas Nielsen <ni...@qni.dk>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89071
-----------------------------------------------------------

Ship it!


Ship It!

- Niklas Nielsen


On June 23, 2015, 2:47 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 2:47 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/#review89111
-----------------------------------------------------------


Patch looks great!

Reviews applied: [35756]

All tests passed.

- Mesos ReviewBot


On June 23, 2015, 9:47 p.m., Kapil Arya wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35756/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 9:47 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-2226
>     https://issues.apache.org/jira/browse/MESOS-2226
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the code was not checking for TASK_RUNNING status message
> from the MockExecutor before stopping the scheduler driver. This caused
> the Executor to be terminated prematurely (before the tasks were
> launched) and thus the remove-executor hook was never called. The fix
> was to wait for the TASK_RUNNING status update and then wait for the
> shutdown() within MockExecutor. Only then we wait for the future from
> remove-executor hook.
> 
> 
> Diffs
> -----
> 
>   src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 
> 
> Diff: https://reviews.apache.org/r/35756/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Kapil Arya
> 
>


Re: Review Request 35756: Fixed a race condition in hook tests for remove-executor hook.

Posted by Kapil Arya <ka...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35756/
-----------------------------------------------------------

(Updated June 23, 2015, 5:47 p.m.)


Review request for mesos and Niklas Nielsen.


Changes
-------

Added some comments as per Niklas's comments.


Bugs: MESOS-2226
    https://issues.apache.org/jira/browse/MESOS-2226


Repository: mesos


Description
-------

Previously, the code was not checking for TASK_RUNNING status message
from the MockExecutor before stopping the scheduler driver. This caused
the Executor to be terminated prematurely (before the tasks were
launched) and thus the remove-executor hook was never called. The fix
was to wait for the TASK_RUNNING status update and then wait for the
shutdown() within MockExecutor. Only then we wait for the future from
remove-executor hook.


Diffs (updated)
-----

  src/tests/hook_tests.cpp 3ffde6d6b2faeb5a8a40eb27c3b0a2b7f9ecd2b1 

Diff: https://reviews.apache.org/r/35756/diff/


Testing
-------

make check.


Thanks,

Kapil Arya