You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/04/16 02:58:23 UTC
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------
(Updated April 16, 2013, 12:58 a.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Vinod's review. Also, updated the test to use the new abstractions!
Summary (updated)
-----------------
Send NoMasterDetectedMessage on session timeout to non-contending detectors. Added a disconnected slave map to the master to track disconnected slaves, in order to disallow slave re-registration after a network partition.
Description
-------
See above. This is a fix of MESOS-305.
This also fixes MESOS-362.
This addresses bugs MESOS-305 and MESOS-362.
https://issues.apache.org/jira/browse/MESOS-305
https://issues.apache.org/jira/browse/MESOS-362
Diffs (updated)
-----
src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3
src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
Diff: https://reviews.apache.org/r/10172/diff/
Testing
-------
make check
Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
Ran into MESOS-406, but otherwise no issues.
Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
Thanks,
Ben Mahler
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/master/http.cpp, line 270
> > <https://reviews.apache.org/r/10172/diff/2/?file=281465#file281465line270>
> >
> > did you forget to kill this?
I will be renaming 'connected' to 'activated' in the next review: https://reviews.apache.org/r/10534/
> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 231
> > <https://reviews.apache.org/r/10172/diff/2/?file=281468#file281468line231>
> >
> > s/Process/process/ ?
Well.. I want to be clear that I'm referring to a libprocess "Process".
> On April 16, 2013, 7:02 p.m., Vinod Kone wrote:
> > src/tests/fault_tolerance_tests.cpp, line 305
> > <https://reviews.apache.org/r/10172/diff/2/?file=281468#file281468line305>
> >
> > You could simplify this by while(lostStatus.isPending()).
> >
> > But I will leave it up to you.
Punting for now, I want to be explicit about the pings, I also borrowed this from other tests (I know, no excuse ;)).
- Ben
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19267
-----------------------------------------------------------
On April 16, 2013, 12:58 a.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 16, 2013, 12:58 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
> src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3
> src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19267
-----------------------------------------------------------
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39880>
thank you
src/master/http.cpp
<https://reviews.apache.org/r/10172/#comment39882>
did you forget to kill this?
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39893>
Great comment!
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39894>
s/Process/process/ ?
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39895>
Pull this down after driver.start() but before driver.launchTasks()
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39896>
The lost status happens much later in the test. Pull this down to where it is expected.
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39897>
s/received/handled/
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39898>
You could use Future<Protobuf> and DROP_PROTOBUF().
We typically use 'Message', when we are interested in the pids (message.from and message.to).
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39899>
You could simplify this by while(lostStatus.isPending()).
But I will leave it up to you.
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39900>
again, you could use FUTURE_PROTOBUF here
src/tests/fault_tolerance_tests.cpp
<https://reviews.apache.org/r/10172/#comment39901>
Excellent test!
- Vinod Kone
On April 16, 2013, 12:58 a.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 16, 2013, 12:58 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
> src/tests/fault_tolerance_tests.cpp bfb30344ca02cd42c442a373d44d6a3fa287c1e3
> src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
> On April 29, 2013, 10:53 p.m., Benjamin Hindman wrote:
> > src/master/master.hpp, lines 232-233
> > <https://reviews.apache.org/r/10172/diff/5/?file=284292#file284292line232>
> >
> > Why are you calling these *PIDs?
>
> Ben Mahler wrote:
> Whoops, leftover from before, s/PIDs/Hosts/?
I removed these in my next review, sorry about that: https://reviews.apache.org/r/10534
- Ben
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------
On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 24, 2013, 11:55 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436
> src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21
> src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08
> src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
> On April 29, 2013, 10:53 p.m., Benjamin Hindman wrote:
> > src/master/master.hpp, lines 232-233
> > <https://reviews.apache.org/r/10172/diff/5/?file=284292#file284292line232>
> >
> > Why are you calling these *PIDs?
Whoops, leftover from before, s/PIDs/Hosts/?
- Ben
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------
On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 24, 2013, 11:55 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436
> src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21
> src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08
> src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19906
-----------------------------------------------------------
Ship it!
src/master/master.hpp
<https://reviews.apache.org/r/10172/#comment41066>
Why are you calling these *PIDs?
- Benjamin Hindman
On April 24, 2013, 11:55 p.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 24, 2013, 11:55 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436
> src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21
> src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08
> src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------
(Updated April 30, 2013, 1:30 a.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Rebased off trunk.
Description
-------
See above. This is a fix of MESOS-305.
This also fixes MESOS-362.
This addresses bugs MESOS-305 and MESOS-362.
https://issues.apache.org/jira/browse/MESOS-305
https://issues.apache.org/jira/browse/MESOS-362
Diffs (updated)
-----
src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
src/master/master.hpp 4a8aaee5a9970c0dd5cb022f04e48fb308241e20
src/master/master.cpp ff2f9546b3e5c885da0a5986606beaca57ba4d5c
src/tests/fault_tolerance_tests.cpp 70e2d558af72cc267240042577cf9f0fbfebe6d6
src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08
src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b
Diff: https://reviews.apache.org/r/10172/diff/
Testing
-------
make check
Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
Ran into MESOS-406, but otherwise no issues.
Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
Thanks,
Ben Mahler
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------
(Updated April 24, 2013, 11:55 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Rebased.
Description
-------
See above. This is a fix of MESOS-305.
This also fixes MESOS-362.
This addresses bugs MESOS-305 and MESOS-362.
https://issues.apache.org/jira/browse/MESOS-305
https://issues.apache.org/jira/browse/MESOS-362
Diffs (updated)
-----
src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
src/master/master.cpp c3b26b136a529eee34e9cdf9700176c232f6e436
src/tests/fault_tolerance_tests.cpp d3476f7f5d1b7e2ed45385f5145eaccd6c114d21
src/tests/master_detector_tests.cpp b042d6ffb0c2e58c6c338de2b2534fc6b63f5f08
src/tests/zookeeper_tests.cpp 125b16566d5cd59732fef67d80617724ff71433b
Diff: https://reviews.apache.org/r/10172/diff/
Testing
-------
make check
Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
Ran into MESOS-406, but otherwise no issues.
Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
Thanks,
Ben Mahler
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------
(Updated April 19, 2013, 8:55 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Vinod's review.
Description
-------
See above. This is a fix of MESOS-305.
This also fixes MESOS-362.
This addresses bugs MESOS-305 and MESOS-362.
https://issues.apache.org/jira/browse/MESOS-305
https://issues.apache.org/jira/browse/MESOS-362
Diffs (updated)
-----
src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0
src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532
Diff: https://reviews.apache.org/r/10172/diff/
Testing
-------
make check
Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
Ran into MESOS-406, but otherwise no issues.
Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
Thanks,
Ben Mahler
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/#review19468
-----------------------------------------------------------
Ship it!
src/master/master.cpp
<https://reviews.apache.org/r/10172/#comment40214>
not yours, but could you add slave id, pid and hostname to the log message?
src/master/master.cpp
<https://reviews.apache.org/r/10172/#comment40213>
ditto
- Vinod Kone
On April 19, 2013, 6:54 p.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10172/
> -----------------------------------------------------------
>
> (Updated April 19, 2013, 6:54 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Description
> -------
>
> See above. This is a fix of MESOS-305.
>
> This also fixes MESOS-362.
>
>
> This addresses bugs MESOS-305 and MESOS-362.
> https://issues.apache.org/jira/browse/MESOS-305
> https://issues.apache.org/jira/browse/MESOS-362
>
>
> Diffs
> -----
>
> src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
> src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
> src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
> src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
> src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0
> src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
> src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532
>
> Diff: https://reviews.apache.org/r/10172/diff/
>
>
> Testing
> -------
>
> make check
>
> Added tests for the partitioned slave re-registration.
> ./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
>
> Ran into MESOS-406, but otherwise no issues.
>
> Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request: Send NoMasterDetectedMessage on session timeout to
non-contending detectors. Added a disconnected slave map to the master to
track disconnected slaves,
in order to disallow slave re-registration after a network partition.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10172/
-----------------------------------------------------------
(Updated April 19, 2013, 6:54 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Vinod + rebase.
Description
-------
See above. This is a fix of MESOS-305.
This also fixes MESOS-362.
This addresses bugs MESOS-305 and MESOS-362.
https://issues.apache.org/jira/browse/MESOS-305
https://issues.apache.org/jira/browse/MESOS-362
Diffs (updated)
-----
src/detector/detector.cpp 7a8355162d543e017505dd58efd2d7bf96f99623
src/master/http.cpp 71b04f01f45ee73d9c246f469e1368223903abed
src/master/master.hpp 9776a7cb8448e41e5d52288e3c637737cee15a08
src/master/master.cpp 5b0e8c03c516f9fc8bb729c21e876bdde89baf9c
src/tests/fault_tolerance_tests.cpp 0348f20a8f4333f7d2f3786c33e55713cbcbcbe0
src/tests/master_detector_tests.cpp 980f3c720301b83af668e10f479adb9cce4f0c9f
src/tests/zookeeper_tests.cpp 0855f5aee0baef22c4ecbed1b88f14f16bfff532
Diff: https://reviews.apache.org/r/10172/diff/
Testing
-------
make check
Added tests for the partitioned slave re-registration.
./bin/mesos-tests.sh --gtest_filter="FaultToleranceTest.PartitionedSlaveReregistration" --verbose --gtest_break_on_failure --gtest_repeat=3000
Ran into MESOS-406, but otherwise no issues.
Will be adding ZK master detector tests shortly to test that the NoMasterDetectedMessages are being sent.
Thanks,
Ben Mahler