You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/04/16 02:58:23 UTC

Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------

(Updated April 16, 2013, 12:58 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Vinod's review. Also, updated the test to use the new abstractions!


Summary (updated)
-----------------

Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.


Description
-------

See above. This is a fix of MESOS-305.

This also fixes MESOS-362.


This addresses bugs MESOS-305 and MESOS-362.
    https://issues.apache.org/jira/browse/MESOS-305
    https://issues.apache.org/jira/browse/MESOS-362


Diffs (updated)
-----

  src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
  src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
  src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
  src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
  src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3 
  src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 

Diff: https://reviews.apache.org/r/10172/diff/


Testing
-------

make check

Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000

Ran into MESOS-406, but otherwise no issues.

Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.


Thanks,

Ben Mahler


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.

> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/master/http.cpp, line 270
> > <https://reviews.apache.org/r/10172/diff/2/?file=281465#file281465line270>
> >
> >     did you forget to kill this?

I will be renaming 'connected' to 'activated' in the next review: https://reviews.apache.org/r/10534/


> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 231
> > <https://reviews.apache.org/r/10172/diff/2/?file=281468#file281468line231>
> >
> >     s/Process/process/ ?

Well.. I want to be clear that I'm referring to a libprocess "Process".


> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 305
> > <https://reviews.apache.org/r/10172/diff/2/?file=281468#file281468line305>
> >
> >     You could simplify this by while(lostStatus.isPending()).
> >     
> >     But I will leave it up to you.

Punting for now, I want to be explicit about the pings, I also borrowed this from other tests (I know, no excuse ;)).


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19267
-----------------------------------------------------------


On April 16, 2013, 12:58 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 16, 2013, 12:58 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
>   src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3 
>   src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19267
-----------------------------------------------------------



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39880>

    thank you



src/master/http.cpp
<https://reviews.apache.org/r/10172/#comment39882>

    did you forget to kill this?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39893>

    Great comment!



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39894>

    s/Process/process/ ?



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39895>

    Pull this down after driver.start() but before driver.launchTasks()



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39896>

    The lost status happens much later in the test. Pull this down to where it is expected.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39897>

    s/received/handled/



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39898>

    You could use Future<Protobuf> and DROP_PROTOBUF().
    
    We typically use 'Message', when we are interested in the pids (message.from and message.to).



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39899>

    You could simplify this by while(lostStatus.isPending()).
    
    But I will leave it up to you.



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39900>

    again, you could use FUTURE_PROTOBUF here



src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39901>

    Excellent test!


- Vinod Kone


On April 16, 2013, 12:58 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 16, 2013, 12:58 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
>   src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3 
>   src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.

> On April 29, 2013, 10:53 p.m., Benjamin Hindman wrote:
> > src/master/master.hpp, lines 232-233
> > <https://reviews.apache.org/r/10172/diff/5/?file=284292#file284292line232>
> >
> >     Why are you calling these *PIDs?
> 
> Ben Mahler wrote:
>     Whoops, leftover from before, s/PIDs/Hosts/?

I removed these in my next review, sorry about that: https://reviews.apache.org/r/10534


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------


On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 24, 2013, 11:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436 
>   src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21 
>   src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08 
>   src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.

> On April 29, 2013, 10:53 p.m., Benjamin Hindman wrote:
> > src/master/master.hpp, lines 232-233
> > <https://reviews.apache.org/r/10172/diff/5/?file=284292#file284292line232>
> >
> >     Why are you calling these *PIDs?

Whoops, leftover from before, s/PIDs/Hosts/?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------


On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 24, 2013, 11:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436 
>   src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21 
>   src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08 
>   src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------

Ship it!



src/master/master.hpp
<https://reviews.apache.org/r/10172/#comment41066>

    Why are you calling these *PIDs?


- Benjamin Hindman


On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 24, 2013, 11:55 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436 
>   src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21 
>   src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08 
>   src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------

(Updated April 30, 2013, 1:30 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebased off trunk.


Description
-------

See above. This is a fix of MESOS-305.

This also fixes MESOS-362.


This addresses bugs MESOS-305 and MESOS-362.
    https://issues.apache.org/jira/browse/MESOS-305
    https://issues.apache.org/jira/browse/MESOS-362


Diffs (updated)
-----

  src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
  src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
  src/master/master.hpp 4a8aaee5a9970c0dd5cb022f04e48fb308241e20 
  src/master/master.cpp ff2f9546b3e5c885da0a5986606beaca57ba4d5c 
  src/tests/fault_tolerance_tests.cpp 70e2d558af72cc267240042577cf9f0fbfebe6d6 
  src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08 
  src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b 

Diff: https://reviews.apache.org/r/10172/diff/


Testing
-------

make check

Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000

Ran into MESOS-406, but otherwise no issues.

Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.


Thanks,

Ben Mahler


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------

(Updated April 24, 2013, 11:55 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebased.


Description
-------

See above. This is a fix of MESOS-305.

This also fixes MESOS-362.


This addresses bugs MESOS-305 and MESOS-362.
    https://issues.apache.org/jira/browse/MESOS-305
    https://issues.apache.org/jira/browse/MESOS-362


Diffs (updated)
-----

  src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
  src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
  src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
  src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436 
  src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21 
  src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08 
  src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b 

Diff: https://reviews.apache.org/r/10172/diff/


Testing
-------

make check

Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000

Ran into MESOS-406, but otherwise no issues.

Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.


Thanks,

Ben Mahler


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------

(Updated April 19, 2013, 8:55 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Vinod's review.


Description
-------

See above. This is a fix of MESOS-305.

This also fixes MESOS-362.


This addresses bugs MESOS-305 and MESOS-362.
    https://issues.apache.org/jira/browse/MESOS-305
    https://issues.apache.org/jira/browse/MESOS-362


Diffs (updated)
-----

  src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
  src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
  src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
  src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
  src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0 
  src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 
  src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532 

Diff: https://reviews.apache.org/r/10172/diff/


Testing
-------

make check

Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000

Ran into MESOS-406, but otherwise no issues.

Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.


Thanks,

Ben Mahler


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19468
-----------------------------------------------------------

Ship it!



src/master/master.cpp
<https://reviews.apache.org/r/10172/#comment40214>

    not yours, but could you add slave id, pid and hostname to the log message?



src/master/master.cpp
<https://reviews.apache.org/r/10172/#comment40213>

    ditto


- Vinod Kone


On April 19, 2013, 6:54 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
> 
> (Updated April 19, 2013, 6:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See above. This is a fix of MESOS-305.
> 
> This also fixes MESOS-362.
> 
> 
> This addresses bugs MESOS-305 and MESOS-362.
>     https://issues.apache.org/jira/browse/MESOS-305
>     https://issues.apache.org/jira/browse/MESOS-362
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
>   src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
>   src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
>   src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
>   src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0 
>   src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 
>   src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532 
> 
> Diff: https://reviews.apache.org/r/10172/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
> 
> Ran into MESOS-406, but otherwise no issues.
> 
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------

(Updated April 19, 2013, 6:54 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Vinod + rebase.


Description
-------

See above. This is a fix of MESOS-305.

This also fixes MESOS-362.


This addresses bugs MESOS-305 and MESOS-362.
    https://issues.apache.org/jira/browse/MESOS-305
    https://issues.apache.org/jira/browse/MESOS-362


Diffs (updated)
-----

  src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623 
  src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed 
  src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08 
  src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c 
  src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0 
  src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f 
  src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532 

Diff: https://reviews.apache.org/r/10172/diff/


Testing
-------

make check

Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000

Ran into MESOS-406, but otherwise no issues.

Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.


Thanks,

Ben Mahler