You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Megha Sharma <ms...@apple.com> on 2017/08/06 04:35:02 UTC

Re: Review Request 58898: Do not kill non partition aware tasks.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/
-----------------------------------------------------------

(Updated Aug. 6, 2017, 4:35 a.m.)


Review request for mesos, Neil Conway and Jiang Yan Xu.


Summary (updated)
-----------------

Do not kill non partition aware tasks.


Bugs: MESOS-7215
    https://issues.apache.org/jira/browse/MESOS-7215


Repository: mesos


Description (updated)
-------

Master will not kill the tasks for non-Partition aware frameworks
when an unreachable agent re-registers with the master.
Master used to send a ShutdownFrameworkMessages to the agent
to kill the tasks from non partition aware frameworks including the
ones that are still registered which was problematic because the offer
from this agent could still go to the same framework which could then
launch new tasks. The agent would then receive tasks of the same
framework and ignore them because it thinks the framework is shutting
down. The framework is not shutting down of course, so from the master
and the scheduler’s perspective the task is pending in STAGING forever
until the next agent reregistration, which could happen much later.
This commit fixes the problem by not shutting down the non-partition
aware frameworks on such an agent.


Diffs (updated)
-----

  src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
  src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
  src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
  src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 


Diff: https://reviews.apache.org/r/58898/diff/3/

Changes: https://reviews.apache.org/r/58898/diff/2-3/


Testing
-------

make check


Thanks,

Megha Sharma


Re: Review Request 58898: Do not kill non partition aware tasks.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58898/#review182300
-----------------------------------------------------------



Bad patch!

Reviews applied: [58898]

Failed command: python support/apply-reviews.py -n -r 58898

Error:
Traceback (most recent call last):
  File "support/apply-reviews.py", line 417, in <module>
    main()
  File "support/apply-reviews.py", line 412, in main
    reviewboard(options)
  File "support/apply-reviews.py", line 402, in reviewboard
    apply_review(options)
  File "support/apply-reviews.py", line 160, in apply_review
    commit_patch(options)
  File "support/apply-reviews.py", line 261, in commit_patch
    message.write(data['message'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 655: ordinal not in range(128)

Full log: http://mesos-winbot.westus.cloudapp.azure.com/logs/212/console

- Mesos Reviewbot Windows


On Aug. 6, 2017, 4:35 a.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58898/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2017, 4:35 a.m.)
> 
> 
> Review request for mesos, Neil Conway and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7215
>     https://issues.apache.org/jira/browse/MESOS-7215
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Master will not kill the tasks for non-Partition aware frameworks
> when an unreachable agent re-registers with the master.
> Master used to send a ShutdownFrameworkMessages to the agent
> to kill the tasks from non partition aware frameworks including the
> ones that are still registered which was problematic because the offer
> from this agent could still go to the same framework which could then
> launch new tasks. The agent would then receive tasks of the same
> framework and ignore them because it thinks the framework is shutting
> down. The framework is not shutting down of course, so from the master
> and the scheduler’s perspective the task is pending in STAGING forever
> until the next agent reregistration, which could happen much later.
> This commit fixes the problem by not shutting down the non-partition
> aware frameworks on such an agent.
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/master/master.cpp 7f38a5e21884546d4b4c866ca5918db779af8f99 
>   src/tests/partition_tests.cpp 62a84f797201ccd18b71490949e3130d2b9c3668 
> 
> 
> Diff: https://reviews.apache.org/r/58898/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>