You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/10/26 05:10:50 UTC

Review Request: Fix for zookeeper master detector

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Description
-------

This includes a minor refactor of ZooKeeperMasterDetector and

a test to repro the bug seen in https://issues.apache.org/jira/browse/MESOS-299

The fix will be coming in the subsequent diff.


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
  src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

As expected, the test fails

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1025 20:04:47.162278 1898458304 zookeeper_server.cpp:198] Started ZooKeeperServer on port 49773
I1025 20:04:47.213875 27279360 detector.cpp:285] Master detector connected to ZooKeeper ...
I1025 20:04:47.214005 27279360 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
I1025 20:04:47.214823 28352512 detector.cpp:285] Master detector connected to ZooKeeper ...
I1025 20:04:47.214865 28352512 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
I1025 20:04:47.240025 27279360 detector.cpp:456] Master detector found 0 registered masters

GMOCK WARNING:
Uninteresting mock function call - returning directly.
    Function call: noMasterDetected()
Stack trace:
I1025 20:04:47.257087 28352512 detector.cpp:332] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 20:04:47.257599 26742784 detector.cpp:456] Master detector found 1 registered masters
I1025 20:04:47.257779 28352512 detector.cpp:456] Master detector found 1 registered masters
I1025 20:04:47.260438 26742784 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770
I1025 20:04:47.260521 28352512 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770

GMOCK WARNING:
Uninteresting mock function call - returning directly.
    Function call: newMasterDetected(@0x101b09cf0 (1)@169.254.12.175:49770)
Stack trace:
2012-10-25 20:04:47,262:2031(0x10fa22000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:49773] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 20:04:47.262936 28352512 detector.cpp:376] Master detector lost connection to ZooKeeper, attempting to reconnect ...
../../src/tests/zookeeper_tests.cpp:327: Failure
Failed
Waited too long for 'noMasterDetectedCall'
../../src/tests/zookeeper_tests.cpp:315: Failure
Actual function call count doesn't match EXPECT_CALL(slave, noMasterDetected())...
         Expected: to be called once
           Actual: never called - unsatisfied and active
../../src/tests/zookeeper_tests.cpp:319: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1025 20:04:49.588414 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 49773
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (2576 ms)
[----------] 1 test from ZooKeeperTest (2577 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (2914 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 26, 2012, 3:33 a.m., John Sirois wrote:
> > src/detector/detector.hpp, line 122
> > <https://reviews.apache.org/r/7746/diff/1/?file=180296#file180296line122>
> >
> >     replace these 2 lines with @param url ... (is @param actually the correct style for the mesos c++ docstrings?)

done.

we don't really use the param style in the newer parts of the code base. but i will it leave as is for now, for being consistent with the rest of the file.


> On Oct. 26, 2012, 3:33 a.m., John Sirois wrote:
> > src/detector/detector.hpp, line 137
> > <https://reviews.apache.org/r/7746/diff/1/?file=180296#file180296line137>
> >
> >     public - /** real docs too? */ and how about a completed name, sessionId()

Formatted doc style. 

we use session() to return session id at couple of different places (e.g group), so i will keep it as is for consistency.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12813
-----------------------------------------------------------


On Oct. 26, 2012, 3:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2012, 3:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This includes a minor refactor of ZooKeeperMasterDetector and
> 
> a test to repro the bug seen in https://issues.apache.org/jira/browse/MESOS-299
> 
> The fix will be coming in the subsequent diff.
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
>   src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> As expected, the test fails
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1025 20:04:47.162278 1898458304 zookeeper_server.cpp:198] Started ZooKeeperServer on port 49773
> I1025 20:04:47.213875 27279360 detector.cpp:285] Master detector connected to ZooKeeper ...
> I1025 20:04:47.214005 27279360 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
> I1025 20:04:47.214823 28352512 detector.cpp:285] Master detector connected to ZooKeeper ...
> I1025 20:04:47.214865 28352512 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
> I1025 20:04:47.240025 27279360 detector.cpp:456] Master detector found 0 registered masters
> 
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: noMasterDetected()
> Stack trace:
> I1025 20:04:47.257087 28352512 detector.cpp:332] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 20:04:47.257599 26742784 detector.cpp:456] Master detector found 1 registered masters
> I1025 20:04:47.257779 28352512 detector.cpp:456] Master detector found 1 registered masters
> I1025 20:04:47.260438 26742784 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770
> I1025 20:04:47.260521 28352512 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770
> 
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: newMasterDetected(@0x101b09cf0 (1)@169.254.12.175:49770)
> Stack trace:
> 2012-10-25 20:04:47,262:2031(0x10fa22000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:49773] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 20:04:47.262936 28352512 detector.cpp:376] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> ../../src/tests/zookeeper_tests.cpp:327: Failure
> Failed
> Waited too long for 'noMasterDetectedCall'
> ../../src/tests/zookeeper_tests.cpp:315: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, noMasterDetected())...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> ../../src/tests/zookeeper_tests.cpp:319: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1025 20:04:49.588414 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 49773
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (2576 ms)
> [----------] 1 test from ZooKeeperTest (2577 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (2914 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by John Sirois <jo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12813
-----------------------------------------------------------

Ship it!


Afaict the test looks good


src/detector/detector.hpp
<https://reviews.apache.org/r/7746/#comment27430>

    replace these 2 lines with @param url ... (is @param actually the correct style for the mesos c++ docstrings?)



src/detector/detector.hpp
<https://reviews.apache.org/r/7746/#comment27431>

    public - /** real docs too? */ and how about a completed name, sessionId() 


- John Sirois


On Oct. 26, 2012, 3:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2012, 3:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This includes a minor refactor of ZooKeeperMasterDetector and
> 
> a test to repro the bug seen in https://issues.apache.org/jira/browse/MESOS-299
> 
> The fix will be coming in the subsequent diff.
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
>   src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> As expected, the test fails
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1025 20:04:47.162278 1898458304 zookeeper_server.cpp:198] Started ZooKeeperServer on port 49773
> I1025 20:04:47.213875 27279360 detector.cpp:285] Master detector connected to ZooKeeper ...
> I1025 20:04:47.214005 27279360 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
> I1025 20:04:47.214823 28352512 detector.cpp:285] Master detector connected to ZooKeeper ...
> I1025 20:04:47.214865 28352512 detector.cpp:302] Trying to create path '/mesos' in ZooKeeper
> I1025 20:04:47.240025 27279360 detector.cpp:456] Master detector found 0 registered masters
> 
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: noMasterDetected()
> Stack trace:
> I1025 20:04:47.257087 28352512 detector.cpp:332] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 20:04:47.257599 26742784 detector.cpp:456] Master detector found 1 registered masters
> I1025 20:04:47.257779 28352512 detector.cpp:456] Master detector found 1 registered masters
> I1025 20:04:47.260438 26742784 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770
> I1025 20:04:47.260521 28352512 detector.cpp:491] Master detector got new master pid: (1)@169.254.12.175:49770
> 
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
>     Function call: newMasterDetected(@0x101b09cf0 (1)@169.254.12.175:49770)
> Stack trace:
> 2012-10-25 20:04:47,262:2031(0x10fa22000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:49773] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 20:04:47.262936 28352512 detector.cpp:376] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> ../../src/tests/zookeeper_tests.cpp:327: Failure
> Failed
> Waited too long for 'noMasterDetectedCall'
> ../../src/tests/zookeeper_tests.cpp:315: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, noMasterDetected())...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> ../../src/tests/zookeeper_tests.cpp:319: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1025 20:04:49.588414 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 49773
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (2576 ms)
> [----------] 1 test from ZooKeeperTest (2577 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (2914 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12838
-----------------------------------------------------------

Ship it!



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27493>

    This is redundant and unnecessary.



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/7746/#comment27494>

    Can we move these up above creating the non-contender? I think that will be better coupling.



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/7746/#comment27498>

    I expected this test to actually make sure that we get a NoMasterDetectedMessage (see comment above).



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/7746/#comment27495>

    How about a comment off to the side that we're waiting 5 seconds to allow for the session expiration to occur.


- Benjamin Hindman


On Oct. 26, 2012, 7:27 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2012, 7:27 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Fix for master detector
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
>   src/tests/zookeeper_server.hpp 6355e8479a636c889945eead12d863b827d78929 
>   src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12899
-----------------------------------------------------------


Thanks!

- Ben Mahler


On Oct. 30, 2012, 12:45 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2012, 12:45 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Master detector will only send a NoMasterDetected() message to the leading master. 
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
>   src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
>   src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12928
-----------------------------------------------------------

Ship it!



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27814>

    :



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27815>

    :



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/7746/#comment27816>

    A comment about why the explicit 5 seconds would be helpful (so someone doesn't try and remove it later).


- Benjamin Hindman


On Oct. 30, 2012, 1:38 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2012, 1:38 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Master detector will only send a NoMasterDetected() message to the leading master. 
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
>   src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
>   src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 30, 2012, 1:38 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

fixed the test


Description
-------

Master detector will only send a NoMasterDetected() message to the leading master. 


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
  src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
  src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 30, 2012, 12:45 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

better comment.


Description (updated)
-------

Master detector will only send a NoMasterDetected() message to the leading master. 


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
  src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
  src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 30, 2012, 12:31 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Fixed sending of spurious NoMasterDetected messages.

BenM's comments


Description
-------

BenM's comments


Fix for master detector


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
  src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
  src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12889
-----------------------------------------------------------



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27709>

    re-raising, since 'the' doesn't seem right to me



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27710>

    ditto



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27711>

    ditto


- Ben Mahler


On Oct. 29, 2012, 7:34 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2012, 7:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> BenM's comments
> 
> 
> Fix for master detector
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
>   src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
>   src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 29, 2012, 7:34 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenM's comments.

Rebased off latest trunk


Description (updated)
-------

BenM's comments


Fix for master detector


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
  src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
  src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 28, 2012, 7:53 p.m., Ben Mahler wrote:
> > src/detector/detector.cpp, line 470
> > <https://reviews.apache.org/r/7746/diff/4/?file=180546#file180546line470>
> >
> >     s/the/a since there can be many contending detectors, correct?

no, only one type of detector per use


> On Oct. 28, 2012, 7:53 p.m., Ben Mahler wrote:
> > src/detector/detector.cpp, line 484
> > <https://reviews.apache.org/r/7746/diff/4/?file=180546#file180546line484>
> >
> >     Maybe a little bit about why it will never know? I'm assuming there's some zk semantics that cause this?

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12867
-----------------------------------------------------------


On Oct. 29, 2012, 7:34 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2012, 7:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> BenM's comments
> 
> 
> Fix for master detector
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 0321bc516166aacfd261c48f1f4293622d18ae0e 
>   src/tests/zookeeper_test_server.hpp 06320439b993f9612ea01303f7446dadf97dc045 
>   src/tests/zookeeper_tests.cpp 3f001affe0dd4b8002e99a658c47b8ea86ddb7d6 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/#review12867
-----------------------------------------------------------

Ship it!


For posterity, can you update the description of this review to describe the fix in question?

Looks good, without me looking too much into ZK.


src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27647>

    s/the/a since there can be many contending detectors, correct?



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27648>

    ditto s/the/a



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27649>

    s/the contender/a contender



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27650>

    s/its/it's



src/detector/detector.cpp
<https://reviews.apache.org/r/7746/#comment27651>

    Maybe a little bit about why it will never know? I'm assuming there's some zk semantics that cause this?



src/tests/zookeeper_tests.cpp
<https://reviews.apache.org/r/7746/#comment27652>

    Can you add a comment on this one, since this is the meat of the test, right?
    
    // Ensure we get the noMasterDetected call?


- Ben Mahler


On Oct. 26, 2012, 11:18 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7746/
> -----------------------------------------------------------
> 
> (Updated Oct. 26, 2012, 11:18 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Fixed test
> 
> 
> added comment for expire session
> 
> 
> Fix for master detector
> 
> 
> Failed test
> 
> 
> This addresses bug MESOS-299.
>     https://issues.apache.org/jira/browse/MESOS-299
> 
> 
> Diffs
> -----
> 
>   src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
>   src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
>   src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
>   src/tests/zookeeper_server.hpp 6355e8479a636c889945eead12d863b827d78929 
>   src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 
> 
> Diff: https://reviews.apache.org/r/7746/diff/
> 
> 
> Testing
> -------
> 
> Test output before the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ....
> ....
> I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
> I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
> 2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
> W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
> I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
> ../../src/tests/zookeeper_tests.cpp:332: Failure
> Failed
> Waited too long for 'newMasterDetectedCall2'
> ../../src/tests/zookeeper_tests.cpp:324: Failure
> Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
>          Expected: to be called once
>            Actual: never called - unsatisfied and active
> I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
> [----------] 1 test from ZooKeeperTest (5929 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (6147 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession
> 
> 
> Test output after the fix:
> 
> [vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
> ...
> ...
> I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
> I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> 2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
> I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
> 2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
> W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
> I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
> I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
> I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
> I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
> I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
> [       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
> [----------] 1 test from ZooKeeperTest (3634 ms total)
> 
> [----------] Global test environment tear-down
> [==========] 1 test from 1 test case ran. (4068 ms total)
> [  PASSED  ] 1 test.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 26, 2012, 11:18 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Update the code and tests after discussion with BenH.


Description (updated)
-------

Fixed test


added comment for expire session


Fix for master detector


Failed test


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
  src/tests/zookeeper_server.hpp 6355e8479a636c889945eead12d863b827d78929 
  src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 26, 2012, 7:27 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Doc'ed expireSession() to expire a delay.

Turns out, fixing it to get an immediate sessione expiration is rather tricky (and possibly not  yet supported by the ZooKeeper C binding)


Description (updated)
-------

Fix for master detector


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
  src/tests/zookeeper_server.hpp 6355e8479a636c889945eead12d863b827d78929 
  src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 

Diff: https://reviews.apache.org/r/7746/diff/


Testing
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone


Re: Review Request: Fix for zookeeper master detector

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7746/
-----------------------------------------------------------

(Updated Oct. 26, 2012, 7:12 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

The previous test was actually broken. I fixed the test and made sure the test fails as expected.

Then I fixed the master detector and made sure the test passed.


Description (updated)
-------

Fix for master detector, when zk session expiration happens at the slave but the leading master stays put.


This addresses bug MESOS-299.
    https://issues.apache.org/jira/browse/MESOS-299


Diffs (updated)
-----

  src/detector/detector.hpp d859b080b99e23d511458a27272db33c5486bb4b 
  src/detector/detector.cpp 62df8bdf539eb13b2a6dc00eb2f6a07381d59106 
  src/slave/slave.cpp 5af7464aae17c00a0e707421982d7cb055aabc6c 
  src/tests/zookeeper_tests.cpp 4415a33b94dd6ca360a7dd3ca49f4c29ee25f5e8 

Diff: https://reviews.apache.org/r/7746/diff/


Testing (updated)
-------

Test output before the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
....
....
I1026 00:05:39.087263 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087425 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.087811 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:39.087836 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:39.102313 27279360 detector.cpp:467] Master detector found 0 registered masters
I1026 00:05:39.110910 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1026 00:05:39.111507 27815936 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.111590 26742784 detector.cpp:467] Master detector found 1 registered masters
I1026 00:05:39.114651 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
I1026 00:05:39.114917 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51375
2012-10-26 00:05:39,116:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51378] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1026 00:05:39.116739 27279360 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-26 00:05:42,450:15851(0x10fa9f000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51378] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9be3f5b90000 has expired.
W1026 00:05:42.450742 27279360 detector.cpp:397] Master detector ZooKeeper session expired!
I1026 00:05:42.454856 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1026 00:05:42.454888 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1026 00:05:42.501096 27279360 detector.cpp:467] Master detector found 1 registered masters
../../src/tests/zookeeper_tests.cpp:332: Failure
Failed
Waited too long for 'newMasterDetectedCall2'
../../src/tests/zookeeper_tests.cpp:324: Failure
Actual function call count doesn't match EXPECT_CALL(slave, newMasterDetected(master.self()))...
         Expected: to be called once
           Actual: never called - unsatisfied and active
I1026 00:05:44.844130 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51378
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession (5929 ms)
[----------] 1 test from ZooKeeperTest (5929 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (6147 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] ZooKeeperTest.MasterDetectorExpireZKSession


Test output after the fix:

[vinod@VKone ~/workspace/apache/mesos/build (vinod/master_detector_fix)]$ GLOG_v=1 ./bin/mesos-tests.sh --verbose --gtest_filter="*ZooKeeperTest.MasterDetectorExpire*" 
...
...
I1025 23:42:01.587967 26742784 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588099 26742784 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.588544 27279360 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:01.588577 27279360 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:01.609194 26742784 detector.cpp:333] Created ephemeral/sequence znode at '/mesos/0000000000'
I1025 23:42:01.610599 27279360 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.610780 26742784 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:01.613991 27279360 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:01.614141 26742784 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
2012-10-25 23:42:01,616:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1603: Socket [127.0.0.1:51028] zk retcode=-4, errno=64(Host is down): failed while receiving a server response
I1025 23:42:01.616317 26742784 detector.cpp:378] Master detector lost connection to ZooKeeper, attempting to reconnect ...
2012-10-25 23:42:04,950:12480(0x10faa8000):ZOO_ERROR@handle_socket_error_msg@1621: Socket [127.0.0.1:51028] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x13a9bce54800001 has expired.
W1025 23:42:04.950316 26742784 detector.cpp:397] Master detector ZooKeeper session expired!
I1025 23:42:04.954572 27815936 detector.cpp:286] Master detector connected to ZooKeeper ...
I1025 23:42:04.954607 27815936 detector.cpp:303] Trying to create path '/mesos' in ZooKeeper
I1025 23:42:05.008098 27815936 detector.cpp:467] Master detector found 1 registered masters
I1025 23:42:05.008566 27815936 detector.cpp:502] Master detector got new master pid: (1)@192.168.1.127:51025
I1025 23:42:05.010418 1898458304 zookeeper_server.cpp:181] Shutdown ZooKeeperServer on port 51028
[       OK ] ZooKeeperTest.MasterDetectorExpireZKSession (3633 ms)
[----------] 1 test from ZooKeeperTest (3634 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4068 ms total)
[  PASSED  ] 1 test.


Thanks,

Vinod Kone