You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/04/16 02:58:17 UTC

Re: Review Request: Send NoMasterDetectedMessage to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.


> On March 29, 2013, 9:34 p.m., Vinod Kone wrote:
> > src/master/http.cpp, line 268
> > <https://reviews.apache.org/r/10172/diff/1/?file=275912#file275912line268>
> >
> >     what is the difference between activated and connected slaves?

Sent out another review that fixes this.


> On March 29, 2013, 9:34 p.m., Vinod Kone wrote:
> > src/master/master.hpp, lines 232-233
> > <https://reviews.apache.org/r/10172/diff/1/?file=275913#file275913line232>
> >
> >     kill slavePIDs. just use slaves.
> >     
> >     use hashset<SlaveID> deactivated.
> >     
> >     kill active or connected. just maintain one variable.

Fixed in the separate review I sent out.


> On March 29, 2013, 9:34 p.m., Vinod Kone wrote:
> > src/tests/master_detector_tests.cpp, line 92
> > <https://reviews.apache.org/r/10172/diff/1/?file=275916#file275916line92>
> >
> >     why write stuff to the work directory?
> >     
> >     i thought you added a "sandbox" in MesosTest for this stuff?

After discussing in an earlier, we removed the sandbox as the slave work directory is effectively a sandbox already.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review18532
-----------------------------------------------------------


On March 29, 2013, 1:38 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated March 29, 2013, 1:38 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
>   src/tests/fault_tolerance_tests.cpp 9d3f8b1bfb58d459b1719d2ba1dbb2e93858fc92 
>   src/tests/master_detector_tests.cpp fe3b91fb375e0b09f8f2de3e69e736cd5f5b94ba 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>