You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Andrei Sekretenko (Jira)" <ji...@apache.org> on 2019/10/24 13:16:00 UTC

[jira] [Comment Edited] (MESOS-9483) ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors is flaky

    [ https://issues.apache.org/jira/browse/MESOS-9483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958884#comment-16958884 ] 

Andrei Sekretenko edited comment on MESOS-9483 at 10/24/19 1:15 PM:
--------------------------------------------------------------------

Observed a similar failure again in the same environment:
{noformat}
[ RUN      ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors
I1023 17:37:33.530347 384742848 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54301
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd9500bad50 flags=0
2019-10-23 17:37:33,531:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:33,533:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90000, negotiated timeout=10000
I1023 17:37:33.533334 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper
I1023 17:37:33.533380 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
I1023 17:37:33.533393 179191808 group.cpp:395] Authenticating with ZooKeeper using digest
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:54301] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:54301 timed out (exceeded timeout by 2139ms)
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms
I1023 17:37:42.337549 179191808 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 8816ms
2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:45,680:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:54301] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x16dfb327ed90000 has expired.
I1023 17:37:45.680534 181874688 group.cpp:511] ZooKeeper session expired
2019-10-23 17:37:45,680:26960(0x70000ab67000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x16dfb327ed90000

2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd94fc98990 flags=0
2019-10-23 17:37:45,681:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:45,682:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90001, negotiated timeout=10000
I1023 17:37:45.682644 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper
I1023 17:37:45.682677 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
I1023 17:37:45.682689 179191808 group.cpp:395] Authenticating with ZooKeeper using digest
../../src/tests/master_contender_detector_tests.cpp:409: Failure
Failed to wait 15secs for group1.join("data")
2019-10-23 17:37:49,793:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 778ms
2019-10-23 17:37:49,832:26960(0x70000b717000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded
I1023 17:37:49.832679 179191808 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
2019-10-23 17:37:49,836:26960(0x116eeb5c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x16dfb327ed90001 to [127.0.0.1:54301]

I1023 17:37:49.836292 384742848 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 54301
[  FAILED  ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16311 ms)
{noformat}


was (Author: asekretenko):
Observed a similar failure again:
{noformat}
[ RUN      ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors
I1023 17:37:33.530347 384742848 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 54301
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2019-10-23 17:37:33,530:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd9500bad50 flags=0
2019-10-23 17:37:33,531:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:33,533:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90000, negotiated timeout=10000
I1023 17:37:33.533334 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper
I1023 17:37:33.533380 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
I1023 17:37:33.533393 179191808 group.cpp:395] Authenticating with ZooKeeper using digest
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:54301] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:54301 timed out (exceeded timeout by 2139ms)
2019-10-23 17:37:42,337:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 5474ms
I1023 17:37:42.337549 179191808 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 8816ms
2019-10-23 17:37:45,679:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:45,680:26960(0x70000b717000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:54301] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x16dfb327ed90000 has expired.
I1023 17:37:45.680534 181874688 group.cpp:511] ZooKeeper session expired
2019-10-23 17:37:45,680:26960(0x70000ab67000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x16dfb327ed90000

2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
2019-10-23 17:37:45,680:26960(0x70000aae4000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:54301 sessionTimeout=10000 watcher=0x119c86ac0 sessionId=0 sessionPasswd=<null> context=0x7fd94fc98990 flags=0
2019-10-23 17:37:45,681:26960(0x70000b717000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:54301]
2019-10-23 17:37:45,682:26960(0x70000b717000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:54301], sessionId=0x16dfb327ed90001, negotiated timeout=10000
I1023 17:37:45.682644 179191808 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:53408) connected to ZooKeeper
I1023 17:37:45.682677 179191808 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
I1023 17:37:45.682689 179191808 group.cpp:395] Authenticating with ZooKeeper using digest
../../src/tests/master_contender_detector_tests.cpp:409: Failure
Failed to wait 15secs for group1.join("data")
2019-10-23 17:37:49,793:26960(0x70000b717000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 778ms
2019-10-23 17:37:49,832:26960(0x70000b717000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded
I1023 17:37:49.832679 179191808 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
2019-10-23 17:37:49,836:26960(0x116eeb5c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x16dfb327ed90001 to [127.0.0.1:54301]

I1023 17:37:49.836292 384742848 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 54301
[  FAILED  ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16311 ms)
{noformat}

> ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors is flaky
> ----------------------------------------------------------------
>
>                 Key: MESOS-9483
>                 URL: https://issues.apache.org/jira/browse/MESOS-9483
>             Project: Mesos
>          Issue Type: Bug
>         Environment: Mac OSX w/ libevent
>            Reporter: Benno Evers
>            Priority: Major
>              Labels: flaky-test, foundations
>
> Observed a failure with the following error:
> {noformat}
> ../../src/tests/master_contender_detector_tests.cpp:409: Failure
> Failed to wait 15secs for group1.join("data")
> {noformat}
> Full log:
> {noformat}
> [ RUN      ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors
> I1214 15:03:56.036525 398710208 zookeeper_test_server.cpp:156] Started ZooKeeperTestServer on port 50199
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
> 2018-12-14 15:03:56,036:69505(0x70000396b000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:50199 sessionTimeout=10000 watcher=0x11a65f9a0 sessionId=0 sessionPasswd=<null> context=0x7fcd061125a0 flags=0
> 2018-12-14 15:03:56,037:69505(0x700004415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:50199]
> 2018-12-14 15:03:56,039:69505(0x700004415000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:50199], sessionId=0x167aef64b830000, negotiated timeout=10000
> I1214 15:03:56.039242 60207104 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:49309) connected to ZooKeeper
> I1214 15:03:56.039286 60207104 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
> I1214 15:03:56.039309 60207104 group.cpp:395] Authenticating with ZooKeeper using digest
> 2018-12-14 15:04:05,989:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6619ms
> 2018-12-14 15:04:05,989:69505(0x700004415000):ZOO_ERROR@handle_socket_error_msg@1702: Socket [127.0.0.1:50199] zk retcode=-7, errno=60(Operation timed out): connection to 127.0.0.1:50199 timed out (exceeded timeout by 3284ms)
> 2018-12-14 15:04:05,989:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 6619ms
> I1214 15:04:05.990031 60207104 group.cpp:452] Lost connection to ZooKeeper, attempting to reconnect ...
> 2018-12-14 15:04:09,332:69505(0x700004415000):ZOO_WARN@zookeeper_interest@1597: Exceeded deadline by 9963ms
> 2018-12-14 15:04:09,332:69505(0x700004415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:50199]
> 2018-12-14 15:04:09,333:69505(0x700004415000):ZOO_ERROR@handle_socket_error_msg@1800: Socket [127.0.0.1:50199] zk retcode=-112, errno=70(Stale NFS file handle): sessionId=0x167aef64b830000 has expired.
> I1214 15:04:09.333552 59670528 group.cpp:511] ZooKeeper session expired
> 2018-12-14 15:04:09,333:69505(0x7000038e8000):ZOO_INFO@zookeeper_close@2579: Freeing zookeeper resources for sessionId=0x167aef64b830000
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@753: Client environment:zookeeper.version=zookeeper C client 3.4.8
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@757: Client environment:host.name=Jenkinss-Mac-mini.local
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@764: Client environment:os.name=Darwin
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@765: Client environment:os.arch=18.2.0
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@766: Client environment:os.version=Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@774: Client environment:user.name=jenkins
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@782: Client environment:user.home=/Users/jenkins
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@log_env@794: Client environment:user.dir=/Users/jenkins/workspace/workspace/mesos/Mesos_CI-build/FLAG/SSL/label/mac/mesos/build
> 2018-12-14 15:04:09,333:69505(0x70000375f000):ZOO_INFO@zookeeper_init@827: Initiating client connection, host=127.0.0.1:50199 sessionTimeout=10000 watcher=0x11a65f9a0 sessionId=0 sessionPasswd=<null> context=0x7fcd061125a0 flags=0
> 2018-12-14 15:04:09,334:69505(0x700004415000):ZOO_INFO@check_events@1764: initiated connection to server [127.0.0.1:50199]
> 2018-12-14 15:04:09,335:69505(0x700004415000):ZOO_INFO@check_events@1811: session establishment complete on server [127.0.0.1:50199], sessionId=0x167aef64b830001, negotiated timeout=10000
> I1214 15:04:09.335834 58060800 group.cpp:341] Group process (zookeeper-group(14)@10.0.49.4:49309) connected to ZooKeeper
> I1214 15:04:09.335880 58060800 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
> I1214 15:04:09.335896 58060800 group.cpp:395] Authenticating with ZooKeeper using digest
> ../../src/tests/master_contender_detector_tests.cpp:409: Failure
> Failed to wait 15secs for group1.join("data")
> 2018-12-14 15:04:12,784:69505(0x700004415000):ZOO_INFO@auth_completion_func@1327: Authentication scheme digest succeeded
> I1214 15:04:12.784991 58060800 group.cpp:419] Trying to create path '/mesos' in ZooKeeper
> 2018-12-14 15:04:12,788:69505(0x117c3d5c0):ZOO_INFO@zookeeper_close@2562: Closing zookeeper sessionId=0x167aef64b830001 to [127.0.0.1:50199]
> I1214 15:04:12.789038 398710208 zookeeper_test_server.cpp:116] Shutting down ZooKeeperTestServer on port 50199
> [  FAILED  ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (16756 ms)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)