You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gastón Kleiman <ga...@mesosphere.io> on 2019/01/18 23:00:46 UTC

Review Request 69795: Made agent recover atomically checkpointed resources and operations.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

Review request for mesos, Benno Evers and Greg Mann.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/1/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/#review212161
-----------------------------------------------------------



PASS: Mesos patch 69795 was successfully built and tested.

Reviews applied: `['69790', '69791', '69792', '69793', '69794', '69795']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2804/mesos-review-69795

- Mesos Reviewbot Windows


On Jan. 18, 2019, 3 p.m., Gastón Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69795/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2019, 3 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Bugs: MESOS-9356
>     https://issues.apache.org/jira/browse/MESOS-9356
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made agent recover atomically checkpointed resources and operations.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
>   src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
>   src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
>   src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 
> 
> 
> Diff: https://reviews.apache.org/r/69795/diff/1/
> 
> 
> Testing
> -------
> 
> Current tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.

> On Jan. 23, 2019, 4:58 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 7356-7380 (patched)
> > <https://reviews.apache.org/r/69795/diff/1/?file=2120593#file2120593line7356>
> >
> >     If we change the checkpointing code to checkpoint the operation _before_ it is applied, then we need to apply speculative operations here as well, since `updateOperation()` only applies non-speculative operations when they transition to terminal.

Revision two makes the agent apply pending operations and adds some additional checks.


> On Jan. 23, 2019, 4:58 p.m., Greg Mann wrote:
> > src/slave/slave.cpp
> > Lines 7381-7384 (patched)
> > <https://reviews.apache.org/r/69795/diff/1/?file=2120593#file2120593line7381>
> >
> >     Should we add a `CHECK(operation->latest_status().state() == OPERATION_FINISHED)` here?

Dropping this issue per offline discussion.

Even though right now operations can either be pending or finished, future versions of the agent might use other terminal states, so the check might be too strict.


- Gastón


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/#review212257
-----------------------------------------------------------


On Jan. 24, 2019, 2:37 p.m., Gastón Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69795/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2019, 2:37 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Bugs: MESOS-9356
>     https://issues.apache.org/jira/browse/MESOS-9356
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made agent recover atomically checkpointed resources and operations.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
>   src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
>   src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
>   src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 
> 
> 
> Diff: https://reviews.apache.org/r/69795/diff/4/
> 
> 
> Testing
> -------
> 
> Current tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/#review212257
-----------------------------------------------------------




src/slave/slave.hpp
Lines 558 (patched)
<https://reviews.apache.org/r/69795/#comment297987>

    s/checkpoint/checkpointed/



src/slave/slave.hpp
Lines 563 (patched)
<https://reviews.apache.org/r/69795/#comment297988>

    s/()()/()/



src/slave/slave.cpp
Lines 7323 (patched)
<https://reviews.apache.org/r/69795/#comment297989>

    Use `CHECK_SOME` here instead of an unguarded `.get()`.



src/slave/slave.cpp
Lines 7354 (patched)
<https://reviews.apache.org/r/69795/#comment297992>

    Ditto here regarding the unguarded `.get()`



src/slave/slave.cpp
Lines 7356-7380 (patched)
<https://reviews.apache.org/r/69795/#comment297993>

    If we change the checkpointing code to checkpoint the operation _before_ it is applied, then we need to apply speculative operations here as well, since `updateOperation()` only applies non-speculative operations when they transition to terminal.



src/slave/slave.cpp
Lines 7381-7384 (patched)
<https://reviews.apache.org/r/69795/#comment297996>

    Should we add a `CHECK(operation->latest_status().state() == OPERATION_FINISHED)` here?


- Greg Mann


On Jan. 24, 2019, 12:16 a.m., Gastón Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69795/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2019, 12:16 a.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Bugs: MESOS-9356
>     https://issues.apache.org/jira/browse/MESOS-9356
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made agent recover atomically checkpointed resources and operations.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
>   src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
>   src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
>   src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 
> 
> 
> Diff: https://reviews.apache.org/r/69795/diff/3/
> 
> 
> Testing
> -------
> 
> Current tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/#review212339
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 7324 (patched)
<https://reviews.apache.org/r/69795/#comment298107>

    Nit: not indented far enough.



src/slave/slave.cpp
Lines 7349-7352 (patched)
<https://reviews.apache.org/r/69795/#comment298108>

    Nit: one too many newlines.


- Greg Mann


On Jan. 24, 2019, 10:37 p.m., Gastón Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69795/
> -----------------------------------------------------------
> 
> (Updated Jan. 24, 2019, 10:37 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Bugs: MESOS-9356
>     https://issues.apache.org/jira/browse/MESOS-9356
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made agent recover atomically checkpointed resources and operations.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
>   src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
>   src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
>   src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 
> 
> 
> Diff: https://reviews.apache.org/r/69795/diff/4/
> 
> 
> Testing
> -------
> 
> Current tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 29, 2019, 3:33 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Fixed indentation.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/8/

Changes: https://reviews.apache.org/r/69795/diff/7-8/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/#review212427
-----------------------------------------------------------


Fix it, then Ship it!





src/slave/slave.cpp
Lines 7385-7388 (patched)
<https://reviews.apache.org/r/69795/#comment298194>

    Nit: indented too far.


- Greg Mann


On Jan. 29, 2019, 11:17 p.m., Gastón Kleiman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69795/
> -----------------------------------------------------------
> 
> (Updated Jan. 29, 2019, 11:17 p.m.)
> 
> 
> Review request for mesos, Benno Evers and Greg Mann.
> 
> 
> Bugs: MESOS-9356
>     https://issues.apache.org/jira/browse/MESOS-9356
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Made agent recover atomically checkpointed resources and operations.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
>   src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
>   src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
>   src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 
> 
> 
> Diff: https://reviews.apache.org/r/69795/diff/7/
> 
> 
> Testing
> -------
> 
> Current tests still pass.
> 
> 
> Thanks,
> 
> Gastón Kleiman
> 
>


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 29, 2019, 3:17 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Fixed bugs.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/7/

Changes: https://reviews.apache.org/r/69795/diff/6-7/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 28, 2019, 3:26 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Rebased.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/6/

Changes: https://reviews.apache.org/r/69795/diff/5-6/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 25, 2019, 4:57 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Fixed nits.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/5/

Changes: https://reviews.apache.org/r/69795/diff/4-5/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 24, 2019, 2:37 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Addressed feedback.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/4/

Changes: https://reviews.apache.org/r/69795/diff/3-4/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 23, 2019, 4:16 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Updated proto filename.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/3/

Changes: https://reviews.apache.org/r/69795/diff/2-3/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman


Re: Review Request 69795: Made agent recover atomically checkpointed resources and operations.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69795/
-----------------------------------------------------------

(Updated Jan. 23, 2019, 3:24 p.m.)


Review request for mesos, Benno Evers and Greg Mann.


Changes
-------

Re-apply operations in `OPERATION_PENDING`.


Bugs: MESOS-9356
    https://issues.apache.org/jira/browse/MESOS-9356


Repository: mesos


Description
-------

Made agent recover atomically checkpointed resources and operations.


Diffs (updated)
-----

  src/slave/slave.hpp 2bcd7a93a8f25b77c71c7f931bfaac87649f987c 
  src/slave/slave.cpp ed92f672f5155d70a36ba3619bb6f06fa09bc836 
  src/slave/state.hpp e2180ae37a65a57d1edb29d2ad6cc8232029906e 
  src/slave/state.cpp ae16d6f656cff71bcbdd4ffb02888024530b9274 


Diff: https://reviews.apache.org/r/69795/diff/2/

Changes: https://reviews.apache.org/r/69795/diff/1-2/


Testing
-------

Current tests still pass.


Thanks,

Gastón Kleiman