You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Cody Maloney (JIRA)" <ji...@apache.org> on 2014/12/04 22:27:13 UTC

[jira] [Updated] (MESOS-1548) FaultToleranceTest.MasterFailover is flaky

     [ https://issues.apache.org/jira/browse/MESOS-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cody Maloney updated MESOS-1548:
--------------------------------
    Labels: flaky  (was: )

> FaultToleranceTest.MasterFailover is flaky
> ------------------------------------------
>
>                 Key: MESOS-1548
>                 URL: https://issues.apache.org/jira/browse/MESOS-1548
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Vinod Kone
>              Labels: flaky
>
> {code}
> [ RUN      ] FaultToleranceTest.MasterFailover
> Using temporary directory '/tmp/FaultToleranceTest_MasterFailover_An91kb'
> I0626 15:21:11.941083 13589 leveldb.cpp:176] Opened db in 684083ns
> I0626 15:21:11.941844 13589 leveldb.cpp:183] Compacted db in 195468ns
> I0626 15:21:11.942226 13589 leveldb.cpp:198] Created db iterator in 6853ns
> I0626 15:21:11.942625 13589 leveldb.cpp:204] Seeked to beginning of db in 1662ns
> I0626 15:21:11.942981 13589 leveldb.cpp:273] Iterated through 0 keys in the db in 781ns
> I0626 15:21:11.943406 13589 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0626 15:21:11.944072 13608 recover.cpp:425] Starting replica recovery
> I0626 15:21:11.944135 13608 recover.cpp:451] Replica is in EMPTY status
> I0626 15:21:11.944373 13608 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
> I0626 15:21:11.944427 13608 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0626 15:21:11.944505 13608 recover.cpp:542] Updating replica status to STARTING
> I0626 15:21:11.944641 13608 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 58924ns
> I0626 15:21:11.944658 13608 replica.cpp:320] Persisted replica status to STARTING
> I0626 15:21:11.944699 13608 recover.cpp:451] Replica is in STARTING status
> I0626 15:21:11.945467 13608 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
> I0626 15:21:11.945942 13608 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0626 15:21:11.946418 13608 recover.cpp:542] Updating replica status to VOTING
> I0626 15:21:11.946910 13608 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 19364ns
> I0626 15:21:11.947319 13608 replica.cpp:320] Persisted replica status to VOTING
> I0626 15:21:11.947773 13612 recover.cpp:556] Successfully joined the Paxos group
> I0626 15:21:11.948055 13610 recover.cpp:440] Recover process terminated
> I0626 15:21:11.953730 13612 master.cpp:288] Master 20140626-152111-2759502016-54657-13589 (fedora-20) started on 192.168.122.164:54657
> I0626 15:21:11.953763 13612 master.cpp:325] Master only allowing authenticated frameworks to register
> I0626 15:21:11.953770 13612 master.cpp:330] Master only allowing authenticated slaves to register
> I0626 15:21:11.953779 13612 credentials.hpp:35] Loading credentials for authentication from '/tmp/FaultToleranceTest_MasterFailover_An91kb/credentials'
> I0626 15:21:11.953865 13612 master.cpp:356] Authorization enabled
> I0626 15:21:11.954516 13615 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.164:54657
> I0626 15:21:11.954876 13613 master.cpp:122] No whitelist given. Advertising offers for all slaves
> I0626 15:21:11.955338 13612 master.cpp:1122] The newly elected leader is master@192.168.122.164:54657 with id 20140626-152111-2759502016-54657-13589
> I0626 15:21:11.955361 13612 master.cpp:1135] Elected as the leading master!
> I0626 15:21:11.955369 13612 master.cpp:953] Recovering from registrar
> I0626 15:21:11.955417 13612 registrar.cpp:313] Recovering registrar
> I0626 15:21:11.955669 13609 log.cpp:656] Attempting to start the writer
> I0626 15:21:11.956382 13609 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0626 15:21:11.956444 13609 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 37618ns
> I0626 15:21:11.956456 13609 replica.cpp:342] Persisted promised to 1
> I0626 15:21:11.956651 13609 coordinator.cpp:230] Coordinator attemping to fill missing position
> I0626 15:21:11.957474 13609 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0626 15:21:11.957521 13609 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 21096ns
> I0626 15:21:11.957532 13609 replica.cpp:676] Persisted action at 0
> I0626 15:21:11.958163 13609 replica.cpp:508] Replica received write request for position 0
> I0626 15:21:11.958684 13609 leveldb.cpp:438] Reading position from leveldb took 20198ns
> I0626 15:21:11.958922 13609 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 44830ns
> I0626 15:21:11.958938 13609 replica.cpp:676] Persisted action at 0
> I0626 15:21:11.959285 13609 replica.cpp:655] Replica received learned notice for position 0
> I0626 15:21:11.959832 13609 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 47773ns
> I0626 15:21:11.960201 13609 replica.cpp:676] Persisted action at 0
> I0626 15:21:11.960614 13609 replica.cpp:661] Replica learned NOP action at position 0
> I0626 15:21:11.960783 13609 log.cpp:672] Writer started with ending position 0
> I0626 15:21:11.961611 13614 leveldb.cpp:438] Reading position from leveldb took 12133ns
> I0626 15:21:11.968036 13613 registrar.cpp:346] Successfully fetched the registry (0B)
> I0626 15:21:11.968072 13613 registrar.cpp:422] Attempting to update the 'registry'
> I0626 15:21:11.969460 13609 log.cpp:680] Attempting to append 131 bytes to the log
> I0626 15:21:11.969980 13609 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
> I0626 15:21:11.970623 13613 replica.cpp:508] Replica received write request for position 1
> I0626 15:21:11.970674 13613 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 28212ns
> I0626 15:21:11.970687 13613 replica.cpp:676] Persisted action at 1
> I0626 15:21:11.971130 13610 replica.cpp:655] Replica received learned notice for position 1
> I0626 15:21:11.971504 13610 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 349741ns
> I0626 15:21:11.971525 13610 replica.cpp:676] Persisted action at 1
> I0626 15:21:11.971534 13610 replica.cpp:661] Replica learned APPEND action at position 1
> I0626 15:21:11.974668 13614 registrar.cpp:479] Successfully updated 'registry'
> I0626 15:21:11.974843 13614 registrar.cpp:372] Successfully recovered registrar
> I0626 15:21:11.974921 13614 master.cpp:980] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register
> I0626 15:21:11.976786 13612 log.cpp:699] Attempting to truncate the log to 1
> I0626 15:21:11.976835 13612 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
> I0626 15:21:11.977008 13612 replica.cpp:508] Replica received write request for position 2
> I0626 15:21:11.977048 13612 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 20691ns
> I0626 15:21:11.977061 13612 replica.cpp:676] Persisted action at 2
> I0626 15:21:11.977205 13612 replica.cpp:655] Replica received learned notice for position 2
> I0626 15:21:11.977236 13612 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15525ns
> I0626 15:21:11.977252 13612 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5528ns
> I0626 15:21:11.977260 13612 replica.cpp:676] Persisted action at 2
> I0626 15:21:11.977284 13612 replica.cpp:661] Replica learned TRUNCATE action at position 2
> I0626 15:21:11.981732 13589 sched.cpp:139] Version: 0.20.0
> I0626 15:21:11.982229 13614 sched.cpp:235] New master detected at master@192.168.122.164:54657
> I0626 15:21:11.982256 13614 sched.cpp:285] Authenticating with master master@192.168.122.164:54657
> I0626 15:21:11.982333 13614 authenticatee.hpp:128] Creating new client SASL connection
> I0626 15:21:11.982904 13614 master.cpp:3499] Authenticating scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657
> I0626 15:21:11.982995 13614 authenticator.hpp:156] Creating new server SASL connection
> I0626 15:21:11.983481 13614 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0626 15:21:11.983508 13614 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0626 15:21:11.983536 13614 authenticator.hpp:262] Received SASL authentication start
> I0626 15:21:11.983820 13614 authenticator.hpp:384] Authentication requires more steps
> I0626 15:21:11.983865 13614 authenticatee.hpp:265] Received SASL authentication step
> I0626 15:21:11.983907 13614 authenticator.hpp:290] Received SASL authentication step
> I0626 15:21:11.983924 13614 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0626 15:21:11.983932 13614 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0626 15:21:11.983944 13614 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0626 15:21:11.983953 13614 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0626 15:21:11.983960 13614 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0626 15:21:11.983966 13614 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0626 15:21:11.983978 13614 authenticator.hpp:376] Authentication success
> I0626 15:21:11.984005 13614 authenticatee.hpp:305] Authentication success
> I0626 15:21:11.984029 13614 master.cpp:3539] Successfully authenticated principal 'test-principal' at scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657
> I0626 15:21:11.984098 13614 sched.cpp:359] Successfully authenticated with master master@192.168.122.164:54657
> I0626 15:21:11.984113 13614 sched.cpp:478] Sending registration request to master@192.168.122.164:54657
> I0626 15:21:11.984158 13614 master.cpp:1241] Received registration request from scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657
> I0626 15:21:11.984179 13614 master.cpp:1201] Authorizing framework principal 'test-principal' to receive offers for role '*'
> I0626 15:21:11.984274 13614 master.cpp:1300] Registering framework 20140626-152111-2759502016-54657-13589-0000 at scheduler-13d6b94c-c4b0-4236-b198-5da18a8c7dcc@192.168.122.164:54657
> I0626 15:21:11.984365 13614 sched.cpp:409] Framework registered with 20140626-152111-2759502016-54657-13589-0000
> I0626 15:21:11.984388 13614 sched.cpp:423] Scheduler::registered took 8620ns
> I0626 15:21:11.984426 13614 hierarchical_allocator_process.hpp:331] Added framework 20140626-152111-2759502016-54657-13589-0000
> I0626 15:21:11.984436 13614 hierarchical_allocator_process.hpp:724] No resources available to allocate!
> I0626 15:21:11.984442 13614 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 7446ns
> I0626 15:21:11.984794 13612 master.cpp:619] Master terminating
> 2014-06-26 15:21:12,255:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:15,591:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:18,927:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:22,263:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:25,599:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:28,935:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:32,271:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:35,607:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:38,943:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:42,282:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:45,622:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:48,959:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:52,295:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:55,632:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:21:58,968:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:02,304:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:05,640:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:08,976:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:12,313:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:15,650:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:18,986:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:22,323:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:25,659:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:28,995:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:32,331:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:35,667:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:39,005:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:42,343:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:45,679:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:49,016:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-26 15:22:52,352:13589(0x7fc42b2cf700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:51226] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)