You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Michael Park <mp...@apache.org> on 2017/02/27 22:18:53 UTC

Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/
-----------------------------------------------------------

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-7061
    https://issues.apache.org/jira/browse/MESOS-7061


Repository: mesos


Description
-------

Re-checkpointed the tasks and executors during agent recovery by calling
`checkpointX` to `recoverX` functions for tasks and executors.


Diffs
-----

  src/slave/slave.hpp 3b0aea4e3e9a17501077beccbccaab4abbe11af2 
  src/slave/slave.cpp fc480ae23ffa5cdeeb79b3621a08e1f8703bc01a 

Diff: https://reviews.apache.org/r/57109/diff/


Testing
-------


Thanks,

Michael Park


Re: Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/#review167169
-----------------------------------------------------------




src/slave/slave.cpp (lines 5311 - 5315)
<https://reviews.apache.org/r/57109/#comment239342>

    We could clarify here that in order to support of a scheduler upgrading to MULTI_ROLE and then changing its roles we need to do <this>.



src/slave/slave.cpp (lines 6948 - 6950)
<https://reviews.apache.org/r/57109/#comment239340>

    It would be nice to avoid checkpointing every time we recover the agent, since we only need to re-checkpoint if any allocation info injection took place.
    
    I was also going to suggest a comment here but I think once updated to conditional checkpointing it will be a bit more clear that we're doing this in support of the multi-role upgrade case (if not we probably want to clarify this).


- Benjamin Mahler


On Feb. 27, 2017, 10:18 p.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57109/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2017, 10:18 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7061
>     https://issues.apache.org/jira/browse/MESOS-7061
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Re-checkpointed the tasks and executors during agent recovery by calling
> `checkpointX` to `recoverX` functions for tasks and executors.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 3b0aea4e3e9a17501077beccbccaab4abbe11af2 
>   src/slave/slave.cpp fc480ae23ffa5cdeeb79b3621a08e1f8703bc01a 
> 
> Diff: https://reviews.apache.org/r/57109/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Michael Park
> 
>


Re: Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

Posted by Michael Park <mp...@apache.org>.

> On March 2, 2017, 3:11 p.m., Benjamin Mahler wrote:
> > src/slave/slave.hpp
> > Lines 1062 (patched)
> > <https://reviews.apache.org/r/57109/diff/2/?file=1654025#file1654025line1062>
> >
> >     maybe `recheckpointExecutor`?

Took this recommendation and also updated to `recheckpointTask`.


> On March 2, 2017, 3:11 p.m., Benjamin Mahler wrote:
> > src/slave/slave.cpp
> > Lines 6973 (patched)
> > <https://reviews.apache.org/r/57109/diff/2/?file=1654026#file1654026line6976>
> >
> >     Should we put this after the checkpoint call so that the ownership is a little more clear?

Done.


- Michael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/#review167758
-----------------------------------------------------------


On March 3, 2017, 1:12 a.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57109/
> -----------------------------------------------------------
> 
> (Updated March 3, 2017, 1:12 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7061
>     https://issues.apache.org/jira/browse/MESOS-7061
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Re-checkpointed the `Executor`s and `Task`s during agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 449971b6b343c7714e1d1167a55bbdfe94d2cf83 
>   src/slave/slave.cpp 6ae9458cc81a7623a1837cd636156434a972004b 
> 
> 
> Diff: https://reviews.apache.org/r/57109/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Michael Park
> 
>


Re: Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/#review167758
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.hpp
Lines 416-417 (patched)
<https://reviews.apache.org/r/57109/#comment239720>

    How about `executorsToRecheckpoint` and `tasksToRecheckpoint`?



src/slave/slave.hpp
Lines 1062 (patched)
<https://reviews.apache.org/r/57109/#comment239721>

    maybe `recheckpointExecutor`?



src/slave/slave.hpp
Lines 1063 (patched)
<https://reviews.apache.org/r/57109/#comment239722>

    Ditto here.



src/slave/slave.cpp
Lines 6973 (patched)
<https://reviews.apache.org/r/57109/#comment239723>

    Should we put this after the checkpoint call so that the ownership is a little more clear?


- Benjamin Mahler


On March 2, 2017, 9:48 p.m., Michael Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57109/
> -----------------------------------------------------------
> 
> (Updated March 2, 2017, 9:48 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-7061
>     https://issues.apache.org/jira/browse/MESOS-7061
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Re-checkpointed the `Executor`s and `Task`s during agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 449971b6b343c7714e1d1167a55bbdfe94d2cf83 
>   src/slave/slave.cpp 6ae9458cc81a7623a1837cd636156434a972004b 
> 
> 
> Diff: https://reviews.apache.org/r/57109/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Michael Park
> 
>


Re: Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

Posted by Michael Park <mp...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/
-----------------------------------------------------------

(Updated March 3, 2017, 1:12 a.m.)


Review request for mesos and Benjamin Mahler.


Changes
-------

Addressed bmahler's comments.


Bugs: MESOS-7061
    https://issues.apache.org/jira/browse/MESOS-7061


Repository: mesos


Description
-------

Re-checkpointed the `Executor`s and `Task`s during agent recovery.


Diffs (updated)
-----

  src/slave/slave.hpp 449971b6b343c7714e1d1167a55bbdfe94d2cf83 
  src/slave/slave.cpp 6ae9458cc81a7623a1837cd636156434a972004b 


Diff: https://reviews.apache.org/r/57109/diff/3/

Changes: https://reviews.apache.org/r/57109/diff/2-3/


Testing
-------


Thanks,

Michael Park


Re: Review Request 57109: Re-checkpointed the `Executor`s and `Task`s during agent recovery.

Posted by Michael Park <mp...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57109/
-----------------------------------------------------------

(Updated March 2, 2017, 1:48 p.m.)


Review request for mesos and Benjamin Mahler.


Changes
-------

Addressed bmahler's comments.


Bugs: MESOS-7061
    https://issues.apache.org/jira/browse/MESOS-7061


Repository: mesos


Description (updated)
-------

Re-checkpointed the `Executor`s and `Task`s during agent recovery.


Diffs (updated)
-----

  src/slave/slave.hpp 449971b6b343c7714e1d1167a55bbdfe94d2cf83 
  src/slave/slave.cpp 6ae9458cc81a7623a1837cd636156434a972004b 


Diff: https://reviews.apache.org/r/57109/diff/2/

Changes: https://reviews.apache.org/r/57109/diff/1-2/


Testing
-------


Thanks,

Michael Park