You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2014/10/01 01:31:11 UTC

Review Request 26206: Introduced Master <-> Slave reconciliation.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/
-----------------------------------------------------------

Review request for mesos and Vinod Kone.


Bugs: MESOS-1696
    https://issues.apache.org/jira/browse/MESOS-1696


Repository: mesos-git


Description
-------

The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.

See MESOS-1696 for further details.


Diffs
-----

  src/master/master.hpp d6380199421840aa17d4ce2725dcbcf4a11ce85f 
  src/master/master.cpp a60308f912a1ed81ecd51c677461a8f591d9eb8e 
  src/messages/messages.proto 9ff06b38086010df362036c695a5222371f70f4d 
  src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
  src/slave/slave.cpp c82d99f08cec8959ff9b21e7358401622427f2ed 
  src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 

Diff: https://reviews.apache.org/r/26206/diff/


Testing
-------

make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.


Thanks,

Ben Mahler


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/
-----------------------------------------------------------

(Updated Oct. 9, 2014, 1:12 a.m.)


Review request for mesos and Vinod Kone.


Changes
-------

This also fixes MESOS-1869.


Bugs: MESOS-1696 and MESOS-1869
    https://issues.apache.org/jira/browse/MESOS-1696
    https://issues.apache.org/jira/browse/MESOS-1869


Repository: mesos-git


Description
-------

The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.

See MESOS-1696 for further details.


Diffs
-----

  src/master/master.hpp 37ce31abb45b6d1c4a9c88b0f1e81d1265d382b9 
  src/master/master.cpp 0286353babdb1ef44ed954e19f02998bc272a6c7 
  src/messages/messages.proto b8039efa1638995c2846f5cb515919d5e51cde5c 
  src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
  src/slave/slave.cpp 809b008b1502b80cce4d8b4be0a233117c92ed56 
  src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 

Diff: https://reviews.apache.org/r/26206/diff/


Testing
-------

make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.


Thanks,

Ben Mahler


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/#review55922
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/26206/#comment96291>

    Kill this. I think we should always use SUM in the slave when sending updates.


- Vinod Kone


On Oct. 9, 2014, 12:14 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26206/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2014, 12:14 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1696
>     https://issues.apache.org/jira/browse/MESOS-1696
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.
> 
> See MESOS-1696 for further details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 37ce31abb45b6d1c4a9c88b0f1e81d1265d382b9 
>   src/master/master.cpp 0286353babdb1ef44ed954e19f02998bc272a6c7 
>   src/messages/messages.proto b8039efa1638995c2846f5cb515919d5e51cde5c 
>   src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
>   src/slave/slave.cpp 809b008b1502b80cce4d8b4be0a233117c92ed56 
>   src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 
> 
> Diff: https://reviews.apache.org/r/26206/diff/
> 
> 
> Testing
> -------
> 
> make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/
-----------------------------------------------------------

(Updated Oct. 9, 2014, 12:14 a.m.)


Review request for mesos and Vinod Kone.


Changes
-------

Removed submitted dependencies for reviewbot.


Bugs: MESOS-1696
    https://issues.apache.org/jira/browse/MESOS-1696


Repository: mesos-git


Description
-------

The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.

See MESOS-1696 for further details.


Diffs
-----

  src/master/master.hpp 37ce31abb45b6d1c4a9c88b0f1e81d1265d382b9 
  src/master/master.cpp 0286353babdb1ef44ed954e19f02998bc272a6c7 
  src/messages/messages.proto b8039efa1638995c2846f5cb515919d5e51cde5c 
  src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
  src/slave/slave.cpp 809b008b1502b80cce4d8b4be0a233117c92ed56 
  src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 

Diff: https://reviews.apache.org/r/26206/diff/


Testing
-------

make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.


Thanks,

Ben Mahler


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/
-----------------------------------------------------------

(Updated Oct. 8, 2014, 11:39 p.m.)


Review request for mesos and Vinod Kone.


Changes
-------

The TASK_LOST during master <-> slave reconciliation is now sent through the StatusUpdateManager, due to MESOS-1879.


Bugs: MESOS-1696
    https://issues.apache.org/jira/browse/MESOS-1696


Repository: mesos-git


Description
-------

The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.

See MESOS-1696 for further details.


Diffs (updated)
-----

  src/master/master.hpp 37ce31abb45b6d1c4a9c88b0f1e81d1265d382b9 
  src/master/master.cpp 0286353babdb1ef44ed954e19f02998bc272a6c7 
  src/messages/messages.proto b8039efa1638995c2846f5cb515919d5e51cde5c 
  src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
  src/slave/slave.cpp 809b008b1502b80cce4d8b4be0a233117c92ed56 
  src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 

Diff: https://reviews.apache.org/r/26206/diff/


Testing
-------

make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.


Thanks,

Ben Mahler


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Ben Mahler <be...@gmail.com>.

> On Oct. 1, 2014, 10:30 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 912
> > <https://reviews.apache.org/r/26206/diff/1/?file=709920#file709920line912>
> >
> >     s/no/no need for/
> >     
> >     Can you also add a comment about why this doesn't need to go through the status update manager?

Turns out this actually does need to go through the status update manager for now because of MESOS-1879. I also updated the comment to reflect why it has to for now, and why we can send it directly after MESOS-1879 is resolved.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/#review55152
-----------------------------------------------------------


On Oct. 8, 2014, 11:39 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26206/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 11:39 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1696
>     https://issues.apache.org/jira/browse/MESOS-1696
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.
> 
> See MESOS-1696 for further details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 37ce31abb45b6d1c4a9c88b0f1e81d1265d382b9 
>   src/master/master.cpp 0286353babdb1ef44ed954e19f02998bc272a6c7 
>   src/messages/messages.proto b8039efa1638995c2846f5cb515919d5e51cde5c 
>   src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
>   src/slave/slave.cpp 809b008b1502b80cce4d8b4be0a233117c92ed56 
>   src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 
> 
> Diff: https://reviews.apache.org/r/26206/diff/
> 
> 
> Testing
> -------
> 
> make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 26206: Introduced Master <-> Slave reconciliation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26206/#review55152
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/26206/#comment95533>

    s/no/no need for/
    
    Can you also add a comment about why this doesn't need to go through the status update manager?


- Vinod Kone


On Sept. 30, 2014, 11:31 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26206/
> -----------------------------------------------------------
> 
> (Updated Sept. 30, 2014, 11:31 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1696
>     https://issues.apache.org/jira/browse/MESOS-1696
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> The master must rely on the slave to reconcile tasks that were missing in the re-registration message. Otherwise, the master may incorrectly send TASK_LOST in the event of a race.
> 
> See MESOS-1696 for further details.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp d6380199421840aa17d4ce2725dcbcf4a11ce85f 
>   src/master/master.cpp a60308f912a1ed81ecd51c677461a8f591d9eb8e 
>   src/messages/messages.proto 9ff06b38086010df362036c695a5222371f70f4d 
>   src/slave/slave.hpp 28697102047b972ecb3b6b627ee089b430549fc0 
>   src/slave/slave.cpp c82d99f08cec8959ff9b21e7358401622427f2ed 
>   src/tests/fault_tolerance_tests.cpp e8f532232c091849489971d7fc96ae615ffb6de0 
> 
> Diff: https://reviews.apache.org/r/26206/diff/
> 
> 
> Testing
> -------
> 
> make check, and modified the test that captured the TASK_LOST case. Added another test in a subsequent review.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>