You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2014/06/25 04:27:24 UTC

[jira] [Commented] (MESOS-735) SlaveRecoveryTest/0.RegisterDisconnectedSlave flaky

    [ https://issues.apache.org/jira/browse/MESOS-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042972#comment-14042972 ] 

Vinod Kone commented on MESOS-735:
----------------------------------

Different error, same test.

{code}
[ RUN      ] SlaveRecoveryTest/0.RegisterDisconnectedSlave
Using temporary directory '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_BP7TDh'
I0624 17:16:17.268437 13481 leveldb.cpp:176] Opened db in 476823ns
I0624 17:16:17.269201 13481 leveldb.cpp:183] Compacted db in 168164ns
I0624 17:16:17.269628 13481 leveldb.cpp:198] Created db iterator in 4510ns
I0624 17:16:17.270104 13481 leveldb.cpp:204] Seeked to beginning of db in 1442ns
I0624 17:16:17.270459 13481 leveldb.cpp:273] Iterated through 0 keys in the db in 774ns
I0624 17:16:17.270874 13481 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
I0624 17:16:17.271415 13500 recover.cpp:425] Starting replica recovery
I0624 17:16:17.271567 13500 recover.cpp:451] Replica is in EMPTY status
I0624 17:16:17.271823 13500 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
I0624 17:16:17.271872 13500 recover.cpp:188] Received a recover response from a replica in EMPTY status
I0624 17:16:17.271944 13500 recover.cpp:542] Updating replica status to STARTING
I0624 17:16:17.272018 13500 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 31824ns
I0624 17:16:17.272032 13500 replica.cpp:320] Persisted replica status to STARTING
I0624 17:16:17.272069 13500 recover.cpp:451] Replica is in STARTING status
I0624 17:16:17.272233 13500 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
I0624 17:16:17.272271 13500 recover.cpp:188] Received a recover response from a replica in STARTING status
I0624 17:16:17.272330 13500 recover.cpp:542] Updating replica status to VOTING
I0624 17:16:17.272375 13500 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 13652ns
I0624 17:16:17.272387 13500 replica.cpp:320] Persisted replica status to VOTING
I0624 17:16:17.272407 13500 recover.cpp:556] Successfully joined the Paxos group
I0624 17:16:17.272441 13500 recover.cpp:440] Recover process terminated
I0624 17:16:17.276286 13504 master.cpp:288] Master 20140624-171617-2759502016-49072-13481 (fedora-20) started on 192.168.122.164:49072
I0624 17:16:17.276314 13504 master.cpp:325] Master only allowing authenticated frameworks to register
I0624 17:16:17.276320 13504 master.cpp:332] Master allowing unauthenticated slaves to register
I0624 17:16:17.276329 13504 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_BP7TDh/credentials'
I0624 17:16:17.276401 13504 master.cpp:356] Authorization enabled
I0624 17:16:17.276808 13505 master.cpp:122] No whitelist given. Advertising offers for all slaves
I0624 17:16:17.276962 13505 master.cpp:1122] The newly elected leader is master@192.168.122.164:49072 with id 20140624-171617-2759502016-49072-13481
I0624 17:16:17.276975 13505 master.cpp:1135] Elected as the leading master!
I0624 17:16:17.276983 13505 master.cpp:953] Recovering from registrar
I0624 17:16:17.277031 13505 registrar.cpp:313] Recovering registrar
I0624 17:16:17.277245 13505 log.cpp:656] Attempting to start the writer
I0624 17:16:17.278133 13504 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.164:49072
I0624 17:16:17.278540 13505 replica.cpp:474] Replica received implicit promise request with proposal 1
I0624 17:16:17.278578 13505 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 17010ns
I0624 17:16:17.278590 13505 replica.cpp:342] Persisted promised to 1
I0624 17:16:17.278730 13505 coordinator.cpp:230] Coordinator attemping to fill missing position
I0624 17:16:17.279055 13505 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
I0624 17:16:17.279091 13505 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 16994ns
I0624 17:16:17.279103 13505 replica.cpp:676] Persisted action at 0
I0624 17:16:17.279331 13505 replica.cpp:508] Replica received write request for position 0
I0624 17:16:17.279357 13505 leveldb.cpp:438] Reading position from leveldb took 10721ns
I0624 17:16:17.279383 13505 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 11566ns
I0624 17:16:17.279392 13505 replica.cpp:676] Persisted action at 0
I0624 17:16:17.279508 13505 replica.cpp:655] Replica received learned notice for position 0
I0624 17:16:17.279536 13505 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 13940ns
I0624 17:16:17.279546 13505 replica.cpp:676] Persisted action at 0
I0624 17:16:17.279554 13505 replica.cpp:661] Replica learned NOP action at position 0
I0624 17:16:17.279716 13505 log.cpp:672] Writer started with ending position 0
I0624 17:16:17.280459 13501 leveldb.cpp:438] Reading position from leveldb took 11093ns
I0624 17:16:17.286048 13503 registrar.cpp:346] Successfully fetched the registry (0B)
I0624 17:16:17.286088 13503 registrar.cpp:422] Attempting to update the 'registry'
I0624 17:16:17.287580 13506 log.cpp:680] Attempting to append 131 bytes to the log
I0624 17:16:17.287632 13506 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
I0624 17:16:17.288148 13506 replica.cpp:508] Replica received write request for position 1
I0624 17:16:17.288198 13506 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 25189ns
I0624 17:16:17.288210 13506 replica.cpp:676] Persisted action at 1
I0624 17:16:17.288362 13506 replica.cpp:655] Replica received learned notice for position 1
I0624 17:16:17.288393 13506 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 15116ns
I0624 17:16:17.288403 13506 replica.cpp:676] Persisted action at 1
I0624 17:16:17.288411 13506 replica.cpp:661] Replica learned APPEND action at position 1
I0624 17:16:17.291887 13500 registrar.cpp:479] Successfully updated 'registry'
I0624 17:16:17.291929 13500 registrar.cpp:372] Successfully recovered registrar
I0624 17:16:17.291990 13500 master.cpp:980] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register
I0624 17:16:17.292933 13506 log.cpp:699] Attempting to truncate the log to 1
I0624 17:16:17.292981 13506 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
I0624 17:16:17.293140 13506 replica.cpp:508] Replica received write request for position 2
I0624 17:16:17.293175 13506 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 18201ns
I0624 17:16:17.293186 13506 replica.cpp:676] Persisted action at 2
I0624 17:16:17.293320 13506 replica.cpp:655] Replica received learned notice for position 2
I0624 17:16:17.293349 13506 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 13934ns
I0624 17:16:17.293364 13506 leveldb.cpp:401] Deleting ~1 keys from leveldb took 5067ns
I0624 17:16:17.293373 13506 replica.cpp:676] Persisted action at 2
I0624 17:16:17.293381 13506 replica.cpp:661] Replica learned TRUNCATE action at position 2
I0624 17:16:17.293859 13481 mesos_containerizer.cpp:125] Using isolation: posix/cpu,posix/mem
I0624 17:16:17.296911 13500 slave.cpp:168] Slave started on 39)@192.168.122.164:49072
I0624 17:16:17.296932 13500 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/credential'
I0624 17:16:17.296996 13500 slave.cpp:268] Slave using credential for: test-principal
I0624 17:16:17.297090 13500 slave.cpp:281] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0624 17:16:17.297533 13500 slave.cpp:309] Slave hostname: fedora-20
I0624 17:16:17.297551 13500 slave.cpp:310] Slave checkpoint: true
I0624 17:16:17.298204 13500 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta'
I0624 17:16:17.298295 13500 status_update_manager.cpp:193] Recovering status update manager
I0624 17:16:17.298341 13500 mesos_containerizer.cpp:283] Recovering containerizer
I0624 17:16:17.298533 13500 slave.cpp:3111] Finished recovery
I0624 17:16:17.299015 13500 slave.cpp:584] New master detected at master@192.168.122.164:49072
I0624 17:16:17.299602 13501 status_update_manager.cpp:167] New master detected at master@192.168.122.164:49072
I0624 17:16:17.299872 13500 slave.cpp:660] Authenticating with master master@192.168.122.164:49072
I0624 17:16:17.300474 13506 authenticatee.hpp:128] Creating new client SASL connection
I0624 17:16:17.300979 13506 master.cpp:3499] Authenticating slave(39)@192.168.122.164:49072
I0624 17:16:17.301201 13506 authenticator.hpp:156] Creating new server SASL connection
I0624 17:16:17.301684 13506 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0624 17:16:17.301707 13506 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0624 17:16:17.301733 13506 authenticator.hpp:262] Received SASL authentication start
I0624 17:16:17.301973 13506 authenticator.hpp:384] Authentication requires more steps
I0624 17:16:17.302018 13506 authenticatee.hpp:265] Received SASL authentication step
I0624 17:16:17.302057 13506 authenticator.hpp:290] Received SASL authentication step
I0624 17:16:17.302074 13506 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0624 17:16:17.302081 13506 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0624 17:16:17.302093 13506 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0624 17:16:17.302101 13506 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0624 17:16:17.302109 13506 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0624 17:16:17.302114 13506 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0624 17:16:17.302125 13506 authenticator.hpp:376] Authentication success
I0624 17:16:17.302152 13506 authenticatee.hpp:305] Authentication success
I0624 17:16:17.302170 13506 master.cpp:3539] Successfully authenticated principal 'test-principal' at slave(39)@192.168.122.164:49072
I0624 17:16:17.302430 13500 slave.cpp:633] Detecting new master
I0624 17:16:17.302897 13500 slave.cpp:717] Successfully authenticated with master master@192.168.122.164:49072
I0624 17:16:17.303236 13504 master.cpp:2781] Registering slave at slave(39)@192.168.122.164:49072 (fedora-20) with id 20140624-171617-2759502016-49072-13481-0
I0624 17:16:17.303359 13504 registrar.cpp:422] Attempting to update the 'registry'
I0624 17:16:17.304344 13504 log.cpp:680] Attempting to append 312 bytes to the log
I0624 17:16:17.304386 13504 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3
I0624 17:16:17.304565 13504 replica.cpp:508] Replica received write request for position 3
I0624 17:16:17.304604 13504 leveldb.cpp:343] Persisting action (331 bytes) to leveldb took 19848ns
I0624 17:16:17.304615 13504 replica.cpp:676] Persisted action at 3
I0624 17:16:17.305080 13504 replica.cpp:655] Replica received learned notice for position 3
I0624 17:16:17.305119 13504 leveldb.cpp:343] Persisting action (333 bytes) to leveldb took 18212ns
I0624 17:16:17.305516 13504 replica.cpp:676] Persisted action at 3
I0624 17:16:17.305536 13504 replica.cpp:661] Replica learned APPEND action at position 3
I0624 17:16:17.306056 13501 registrar.cpp:479] Successfully updated 'registry'
I0624 17:16:17.306126 13501 log.cpp:699] Attempting to truncate the log to 3
I0624 17:16:17.306169 13501 master.cpp:2821] Registered slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20)
I0624 17:16:17.306185 13501 master.cpp:3967] Adding slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
I0624 17:16:17.306262 13501 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4
I0624 17:16:17.306350 13501 hierarchical_allocator_process.hpp:444] Added slave 20140624-171617-2759502016-49072-13481-0 (fedora-20) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
I0624 17:16:17.306397 13501 hierarchical_allocator_process.hpp:706] Performed allocation for slave 20140624-171617-2759502016-49072-13481-0 in 15215ns
I0624 17:16:17.306524 13501 replica.cpp:508] Replica received write request for position 4
I0624 17:16:17.306555 13501 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 15063ns
I0624 17:16:17.306565 13501 replica.cpp:676] Persisted action at 4
I0624 17:16:17.306699 13501 replica.cpp:655] Replica received learned notice for position 4
I0624 17:16:17.306726 13501 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 13443ns
I0624 17:16:17.306742 13501 leveldb.cpp:401] Deleting ~2 keys from leveldb took 6813ns
I0624 17:16:17.306751 13501 replica.cpp:676] Persisted action at 4
I0624 17:16:17.306759 13501 replica.cpp:661] Replica learned TRUNCATE action at position 4
I0624 17:16:17.307242 13500 slave.cpp:955] Will retry registration in 3.34032ms if necessary
I0624 17:16:17.307620 13500 slave.cpp:751] Registered with master master@192.168.122.164:49072; given slave ID 20140624-171617-2759502016-49072-13481-0
I0624 17:16:17.308059 13500 slave.cpp:764] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/slave.info'
I0624 17:16:17.308212 13500 slave.cpp:2308] Received ping from slave-observer(38)@192.168.122.164:49072
I0624 17:16:17.311333 13481 sched.cpp:139] Version: 0.20.0
I0624 17:16:17.311962 13504 sched.cpp:235] New master detected at master@192.168.122.164:49072
I0624 17:16:17.311987 13504 sched.cpp:285] Authenticating with master master@192.168.122.164:49072
I0624 17:16:17.312050 13504 authenticatee.hpp:128] Creating new client SASL connection
I0624 17:16:17.312546 13504 master.cpp:3499] Authenticating scheduler-4b225d3a-02b4-4a7d-a4a1-4777cf7bb262@192.168.122.164:49072
I0624 17:16:17.312625 13504 authenticator.hpp:156] Creating new server SASL connection
I0624 17:16:17.313078 13504 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
I0624 17:16:17.313129 13504 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
I0624 17:16:17.313158 13504 authenticator.hpp:262] Received SASL authentication start
I0624 17:16:17.313189 13504 authenticator.hpp:384] Authentication requires more steps
I0624 17:16:17.313213 13504 authenticatee.hpp:265] Received SASL authentication step
I0624 17:16:17.313243 13504 authenticator.hpp:290] Received SASL authentication step
I0624 17:16:17.313257 13504 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
I0624 17:16:17.313266 13504 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
I0624 17:16:17.313274 13504 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
I0624 17:16:17.313283 13504 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-20' server FQDN: 'fedora-20' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
I0624 17:16:17.313290 13504 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
I0624 17:16:17.313295 13504 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
I0624 17:16:17.313307 13504 authenticator.hpp:376] Authentication success
I0624 17:16:17.313330 13504 authenticatee.hpp:305] Authentication success
I0624 17:16:17.313350 13504 master.cpp:3539] Successfully authenticated principal 'test-principal' at scheduler-4b225d3a-02b4-4a7d-a4a1-4777cf7bb262@192.168.122.164:49072
I0624 17:16:17.313410 13504 sched.cpp:359] Successfully authenticated with master master@192.168.122.164:49072
I0624 17:16:17.313423 13504 sched.cpp:478] Sending registration request to master@192.168.122.164:49072
I0624 17:16:17.313462 13504 master.cpp:1241] Received registration request from scheduler-4b225d3a-02b4-4a7d-a4a1-4777cf7bb262@192.168.122.164:49072
I0624 17:16:17.313480 13504 master.cpp:1201] Authorizing framework principal 'test-principal' to receive offers for role '*'
I0624 17:16:17.313585 13504 master.cpp:1300] Registering framework 20140624-171617-2759502016-49072-13481-0000 at scheduler-4b225d3a-02b4-4a7d-a4a1-4777cf7bb262@192.168.122.164:49072
I0624 17:16:17.313664 13504 sched.cpp:409] Framework registered with 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.313685 13504 sched.cpp:423] Scheduler::registered took 8211ns
I0624 17:16:17.313714 13504 hierarchical_allocator_process.hpp:331] Added framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.313747 13504 hierarchical_allocator_process.hpp:750] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 to framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.314810 13500 master.hpp:794] Adding offer 20140624-171617-2759502016-49072-13481-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 (fedora-20)
I0624 17:16:17.314862 13500 master.cpp:3446] Sending 1 offers to framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.314954 13500 sched.cpp:546] Scheduler::resourceOffers took 12558ns
I0624 17:16:17.315163 13504 hierarchical_allocator_process.hpp:686] Performed allocation for 1 slaves in 1.432885ms
I0624 17:16:17.316391 13503 master.hpp:804] Removing offer 20140624-171617-2759502016-49072-13481-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 (fedora-20)
I0624 17:16:17.316449 13503 master.cpp:2125] Processing reply for offers: [ 20140624-171617-2759502016-49072-13481-0 ] on slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20) for framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.316479 13503 master.cpp:2211] Authorizing framework principal 'test-principal' to launch task 3719ebe8-3339-47bc-9c05-704ebe31deea as user 'jenkins'
I0624 17:16:17.316716 13503 master.hpp:766] Adding task 3719ebe8-3339-47bc-9c05-704ebe31deea with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 (fedora-20)
I0624 17:16:17.316754 13503 master.cpp:2277] Launching task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20)
I0624 17:16:17.317138 13503 slave.cpp:986] Got assigned task 3719ebe8-3339-47bc-9c05-704ebe31deea for framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.317205 13503 slave.cpp:3383] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/framework.info'
I0624 17:16:17.317363 13503 slave.cpp:3390] Checkpointing framework pid 'scheduler-4b225d3a-02b4-4a7d-a4a1-4777cf7bb262@192.168.122.164:49072' to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/framework.pid'
I0624 17:16:17.317615 13503 slave.cpp:1096] Launching task 3719ebe8-3339-47bc-9c05-704ebe31deea for framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.319664 13503 slave.cpp:3705] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/executors/3719ebe8-3339-47bc-9c05-704ebe31deea/executor.info'
I0624 17:16:17.321044 13502 mesos_containerizer.cpp:551] Starting container '6fab024c-0e47-49ba-8bfa-7db463035819' for executor '3719ebe8-3339-47bc-9c05-704ebe31deea' of framework '20140624-171617-2759502016-49072-13481-0000'
I0624 17:16:17.323729 13502 launcher.cpp:117] Forked child with pid '14697' for container '6fab024c-0e47-49ba-8bfa-7db463035819'
I0624 17:16:17.324496 13502 mesos_containerizer.cpp:824] Checkpointing executor's forked pid 14697 to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/executors/3719ebe8-3339-47bc-9c05-704ebe31deea/runs/6fab024c-0e47-49ba-8bfa-7db463035819/pids/forked.pid'
I0624 17:16:17.326684 13503 slave.cpp:3820] Checkpointing TaskInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/executors/3719ebe8-3339-47bc-9c05-704ebe31deea/runs/6fab024c-0e47-49ba-8bfa-7db463035819/tasks/3719ebe8-3339-47bc-9c05-704ebe31deea/task.info'
I0624 17:16:17.327487 13502 mesos_containerizer.cpp:661] Fetching URIs for container '6fab024c-0e47-49ba-8bfa-7db463035819' using command '/var/jenkins/workspace/mesos-fedora-20-gcc/src/mesos-fetcher'
I0624 17:16:17.330343 13503 slave.cpp:1206] Queuing task '3719ebe8-3339-47bc-9c05-704ebe31deea' for executor 3719ebe8-3339-47bc-9c05-704ebe31deea of framework '20140624-171617-2759502016-49072-13481-0000
I0624 17:16:17.331161 13503 slave.cpp:545] Successfully attached file '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/executors/3719ebe8-3339-47bc-9c05-704ebe31deea/runs/6fab024c-0e47-49ba-8bfa-7db463035819'
I0624 17:16:18.188956 13500 slave.cpp:2453] Monitoring executor '3719ebe8-3339-47bc-9c05-704ebe31deea' of framework '20140624-171617-2759502016-49072-13481-0000' in container '6fab024c-0e47-49ba-8bfa-7db463035819'
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0624 17:16:18.224558 14697 process.cpp:1671] libprocess is initialized on 192.168.122.164:38825 for 8 cpus
I0624 17:16:18.228303 14697 exec.cpp:131] Version: 0.20.0
I0624 17:16:18.228971 14735 exec.cpp:181] Executor started at: executor(1)@192.168.122.164:38825 with pid 14697
I0624 17:16:18.230078 13500 slave.cpp:1717] Got registration for executor '3719ebe8-3339-47bc-9c05-704ebe31deea' of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.230142 13500 slave.cpp:1802] Checkpointing executor pid 'executor(1)@192.168.122.164:38825' to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_mHr0Zy/meta/slaves/20140624-171617-2759502016-49072-13481-0/frameworks/20140624-171617-2759502016-49072-13481-0000/executors/3719ebe8-3339-47bc-9c05-704ebe31deea/runs/6fab024c-0e47-49ba-8bfa-7db463035819/pids/libprocess.pid'
I0624 17:16:18.230700 13500 slave.cpp:1836] Flushing queued task 3719ebe8-3339-47bc-9c05-704ebe31deea for executor '3719ebe8-3339-47bc-9c05-704ebe31deea' of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.231412 13508 process.cpp:1098] Socket closed while receiving
I0624 17:16:18.232239 14732 exec.cpp:205] Executor registered on slave 20140624-171617-2759502016-49072-13481-0
IRegistered executor on fedora-20
0624 17:16:18.233847 14732 exec.cpp:217] Executor::registered took 128822ns
I0624 17:16:18.234683 14728 exec.cpp:292] Executor asked to run task '3719ebe8-3339-47bc-9c05-704ebe31deea'
IStarting task 3719ebe8-3339-47bc-9c05-704ebe31deea
0624 17:16:18.235162 14728 exec.cpp:301] Executor::launchTask took 158786ns
I0624 17:16:18.236713 14736 process.cpp:1037] Socket closed while receiving
sh -c 'sleep 1000'
Forked command at 14737
I0624 17:16:18.248049 14733 exec.cpp:524] Executor sending status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.249003 13504 slave.cpp:2071] Handling status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 from executor(1)@192.168.122.164:38825
I0624 17:16:18.249148 13504 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.249166 13504 status_update_manager.cpp:499] Creating StatusUpdate stream for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.249430 13504 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.249527 13504 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 to master@192.168.122.164:49072
I0624 17:16:18.249678 13504 master.cpp:3107] Status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 from slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20)
I0624 17:16:18.249719 13504 slave.cpp:2229] Status update manager successfully handled status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.249732 13504 slave.cpp:2235] Sending acknowledgement for status update TASK_RUNNING (UUID: 4edea50b-325e-4a03-b109-170b116fb1f8) for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 to executor(1)@192.168.122.164:38825
I0624 17:16:18.249971 13505 sched.cpp:637] Scheduler::statusUpdate took 26600ns
I0624 17:16:18.250089 13507 master.cpp:2631] Forwarding status update acknowledgement 4edea50b-325e-4a03-b109-170b116fb1f8 for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000 to slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20)
I0624 17:16:18.250618 13481 slave.cpp:469] Slave terminating
I0624 17:16:18.250851 13507 master.cpp:760] Slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20) disconnected
I0624 17:16:18.250872 13507 master.cpp:1602] Disconnecting slave 20140624-171617-2759502016-49072-13481-0
I0624 17:16:18.250928 13507 hierarchical_allocator_process.hpp:483] Slave 20140624-171617-2759502016-49072-13481-0 disconnected
I0624 17:16:18.251524 13507 master.cpp:2751] Removing old disconnected slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20) because a registration attempt is being made from slave(39)@192.168.122.164:49072
I0624 17:16:18.251545 13507 master.cpp:4100] Removing slave 20140624-171617-2759502016-49072-13481-0 at slave(39)@192.168.122.164:49072 (fedora-20)
I0624 17:16:18.251879 13503 hierarchical_allocator_process.hpp:469] Removed slave 20140624-171617-2759502016-49072-13481-0
I0624 17:16:18.252004 13508 process.cpp:1098] Socket closed while receiving
I0624 17:16:18.252671 13507 master.hpp:784] Removing task 3719ebe8-3339-47bc-9c05-704ebe31deea with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140624-171617-2759502016-49072-13481-0 (fedora-20)
I0624 17:16:18.252966 14736 process.cpp:1037] Socket closed while receiving
I0624 17:16:18.253051 14730 exec.cpp:338] Executor received status update acknowledgement 4edea50b-325e-4a03-b109-170b116fb1f8 for task 3719ebe8-3339-47bc-9c05-704ebe31deea of framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.258993 13507 master.cpp:2781] Registering slave at slave(39)@192.168.122.164:49072 (fedora-20) with id 20140624-171617-2759502016-49072-13481-1
I0624 17:16:18.259176 13507 registrar.cpp:422] Attempting to update the 'registry'
I0624 17:16:18.260308 13507 log.cpp:680] Attempting to append 134 bytes to the log
I0624 17:16:18.260372 13507 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 5
I0624 17:16:18.260628 13507 replica.cpp:508] Replica received write request for position 5
I0624 17:16:18.260694 13507 leveldb.cpp:343] Persisting action (153 bytes) to leveldb took 44915ns
I0624 17:16:18.260709 13507 replica.cpp:676] Persisted action at 5
I0624 17:16:18.261569 13504 replica.cpp:655] Replica received learned notice for position 5
I0624 17:16:18.261611 13504 leveldb.cpp:343] Persisting action (155 bytes) to leveldb took 18731ns
I0624 17:16:18.261623 13504 replica.cpp:676] Persisted action at 5
I0624 17:16:18.261632 13504 replica.cpp:661] Replica learned APPEND action at position 5
I0624 17:16:18.264946 13500 registrar.cpp:479] Successfully updated 'registry'
I0624 17:16:18.265034 13500 registrar.cpp:422] Attempting to update the 'registry'
I0624 17:16:18.265987 13500 master.cpp:4202] Removed slave 20140624-171617-2759502016-49072-13481-0 (fedora-20)
I0624 17:16:18.266041 13500 master.cpp:4217] Notifying framework 20140624-171617-2759502016-49072-13481-0000 of lost slave 20140624-171617-2759502016-49072-13481-0 (fedora-20) after recovering
I0624 17:16:18.266183 13500 sched.cpp:637] Scheduler::statusUpdate took 19461ns
I0624 17:16:18.266214 13500 sched.cpp:688] Lost slave 20140624-171617-2759502016-49072-13481-0
I0624 17:16:18.266257 13500 sched.cpp:699] Scheduler::slaveLost took 29559ns
I0624 17:16:18.267004 13501 sched.cpp:747] Stopping framework '20140624-171617-2759502016-49072-13481-0000'
I0624 17:16:18.267091 13501 master.cpp:1530] Asked to unregister framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.267105 13501 master.cpp:3821] Removing framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.267189 13501 hierarchical_allocator_process.hpp:407] Deactivated framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.267231 13501 hierarchical_allocator_process.hpp:362] Removed framework 20140624-171617-2759502016-49072-13481-0000
I0624 17:16:18.267510 13502 mesos_containerizer.cpp:1018] Destroying container '6fab024c-0e47-49ba-8bfa-7db463035819'
I0624 17:16:18.270985 13504 log.cpp:699] Attempting to truncate the log to 5
I0624 17:16:18.271049 13504 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 6
I0624 17:16:18.271286 13504 replica.cpp:508] Replica received write request for position 6
I0624 17:16:18.271347 13504 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 38157ns
I0624 17:16:18.271358 13504 replica.cpp:676] Persisted action at 6
I0624 17:16:18.271528 13504 replica.cpp:655] Replica received learned notice for position 6
I0624 17:16:18.271559 13504 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 15116ns
I0624 17:16:18.271576 13504 leveldb.cpp:401] Deleting ~2 keys from leveldb took 7776ns
I0624 17:16:18.271586 13504 replica.cpp:676] Persisted action at 6
I0624 17:16:18.271595 13504 replica.cpp:661] Replica learned TRUNCATE action at position 6
I0624 17:16:18.279353 13504 hierarchical_allocator_process.hpp:724] No resources available to allocate!
I0624 17:16:18.279392 13504 hierarchical_allocator_process.hpp:686] Performed allocation for 0 slaves in 61408ns
I0624 17:16:18.331594 13508 process.cpp:1037] Socket closed while receiving
I0624 17:16:19.186189 13506 mesos_containerizer.cpp:1134] Executor for container '6fab024c-0e47-49ba-8bfa-7db463035819' has exited
I0624 17:16:19.187074 13481 master.cpp:619] Master terminating
2014-06-24 17:16:19,576:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:22,912:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:26,248:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:29,584:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:32,921:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:36,257:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:39,593:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:42,930:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:46,266:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:49,602:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:52,937:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:56,274:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:16:59,610:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:02,946:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:06,282:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:09,618:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:12,954:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:16,290:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:19,626:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:22,962:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:26,298:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:29,634:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:32,972:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:36,308:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:39,646:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:42,982:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:46,320:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:49,656:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:52,992:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:56,330:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:17:59,666:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:03,002:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:06,338:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:09,674:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:13,010:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:16,348:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:19,684:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:23,020:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:26,356:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:29,692:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:33,028:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:36,366:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:39,703:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:43,041:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:46,377:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:49,715:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:53,055:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:56,391:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:18:59,728:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:03,066:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:06,402:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:09,740:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:13,081:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:16,417:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:19,753:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:23,089:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:26,425:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:29,761:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:33,098:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:36,434:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:39,770:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:43,108:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:46,444:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:49,781:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:53,117:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:56,453:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:19:59,791:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:03,127:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:06,463:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:09,801:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:13,137:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:16,473:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:19,809:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:23,145:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:26,481:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:29,819:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:33,155:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:36,491:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:39,829:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:43,165:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:46,501:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:49,838:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:53,174:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:56,512:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:20:59,848:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:03,184:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:06,521:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:09,857:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:13,193:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:16,529:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:19,865:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:23,201:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:26,539:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:29,875:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:33,213:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
2014-06-24 17:21:36,554:13481(0x7fc5ecff9700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:35023] zk retcode=-4, errno=111(Connection refused): server refused to accept the client

{code}

> SlaveRecoveryTest/0.RegisterDisconnectedSlave flaky
> ---------------------------------------------------
>
>                 Key: MESOS-735
>                 URL: https://issues.apache.org/jira/browse/MESOS-735
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>
> sudo MESOS_VERBOSE=1 GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="SlaveRecoveryTest/0.RegisterDisconnectedSlave" --gtest_repeat=1000 --gest_break_on_failure
> Error:
> ../../src/tests/slave_recovery_tests.cpp:1647: Failure
> Failed to wait 10secs for status
> The TASK_KILLED status for task "3775fcd5-4776-4e46-aacb-93db531757b1" is from a previous iteration of the test.
> Log:
> [ RUN      ] SlaveRecoveryTest/0.RegisterDisconnectedSlave
> I1014 17:13:07.347805 61261 master.cpp:284] Master started on 10.35.12.124:58876
> I1014 17:13:07.347967 61261 master.cpp:299] Master ID: 201310141713-2081170186-58876-61234
> I1014 17:13:07.348089 61261 master.cpp:302] Master only allowing authenticated frameworks to register!
> I1014 17:13:07.348464 61259 hierarchical_allocator_process.hpp:302] Initializing hierarchical allocator process with master : master@10.35.12.124:58876
> I1014 17:13:07.348469 61251 master.cpp:85] No whitelist given. Advertising offers for all slaves
> I1014 17:13:07.348611 61255 slave.cpp:108] Slave started on 168)@10.35.12.124:58876
> I1014 17:13:07.348711 61261 master.cpp:697] Elected as master!
> I1014 17:13:07.348975 61255 slave.cpp:208] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1014 17:13:07.349717 61262 process_isolator.cpp:314] Recovering isolator
> I1014 17:13:07.349750 61255 slave.cpp:547] New master detected at master@10.35.12.124:58876
> I1014 17:13:07.350041 61255 slave.cpp:562] Postponing registration until recovery is complete
> I1014 17:13:07.350059 61260 status_update_manager.cpp:157] New master detected at master@10.35.12.124:58876
> I1014 17:13:07.350128 61255 slave.cpp:399] Finished recovery
> I1014 17:13:07.350483 61256 master.cpp:1248] Attempting to register slave on smfd-aki-27-sr1.devel.twitter.com at slave(168)@10.35.12.124:58876
> I1014 17:13:07.350579 61256 master.cpp:2502] Adding slave 201310141713-2081170186-58876-61234-0 at smfd-aki-27-sr1.devel.twitter.com with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I1014 17:13:07.350703 61258 slave.cpp:613] Registered with master master@10.35.12.124:58876; given slave ID 201310141713-2081170186-58876-61234-0
> I1014 17:13:07.350824 61250 hierarchical_allocator_process.hpp:445] Added slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I1014 17:13:07.350941 61258 slave.cpp:626] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/slave.info'
> I1014 17:13:07.350998 61250 hierarchical_allocator_process.hpp:708] Performed allocation for slave 201310141713-2081170186-58876-61234-0 in 7.03us
> I1014 17:13:07.351054 61254 sched.cpp:195] New master at master@10.35.12.124:58876
> I1014 17:13:07.351310 61254 sched.cpp:281] Authenticating with master master@10.35.12.124:58876
> I1014 17:13:07.351511 61263 authenticatee.hpp:124] Creating new client SASL connection
> I1014 17:13:07.351646 61253 master.cpp:1723] Authenticating framework at scheduler(168)@10.35.12.124:58876
> I1014 17:13:07.351886 61257 authenticator.hpp:140] Creating new server SASL connection
> I1014 17:13:07.352033 61253 authenticatee.hpp:212] Received SASL authentication mechanisms: CRAM-MD5
> I1014 17:13:07.352105 61253 authenticatee.hpp:238] Attempting to authenticate with mechanism 'CRAM-MD5'
> I1014 17:13:07.352202 61255 authenticator.hpp:243] Received SASL authentication start
> I1014 17:13:07.352306 61255 authenticator.hpp:325] Authentication requires more steps
> I1014 17:13:07.352394 61255 authenticatee.hpp:258] Received SASL authentication step
> I1014 17:13:07.352504 61255 authenticator.hpp:271] Received SASL authentication step
> I1014 17:13:07.352581 61255 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'smfd-aki-27-sr1.devel.twitter.com' server FQDN: 'smfd-aki-27-sr1.devel.twitter.com' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I1014 17:13:07.352654 61255 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I1014 17:13:07.352728 61255 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I1014 17:13:07.352825 61255 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'smfd-aki-27-sr1.devel.twitter.com' server FQDN: 'smfd-aki-27-sr1.devel.twitter.com' SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I1014 17:13:07.352893 61255 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I1014 17:13:07.352957 61255 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I1014 17:13:07.353035 61255 authenticator.hpp:317] Authentication success
> I1014 17:13:07.353150 61253 authenticatee.hpp:298] Authentication success
> I1014 17:13:07.353183 61254 master.cpp:1763] Successfully authenticated framework at scheduler(168)@10.35.12.124:58876
> I1014 17:13:07.353281 61253 sched.cpp:326] Successfully authenticated with master master@10.35.12.124:58876
> I1014 17:13:07.353430 61254 master.cpp:768] Received registration request from scheduler(168)@10.35.12.124:58876
> I1014 17:13:07.353574 61254 master.cpp:786] Registering framework 201310141713-2081170186-58876-61234-0000 at scheduler(168)@10.35.12.124:58876
> I1014 17:13:07.353706 61259 sched.cpp:365] Framework registered with 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.353735 61255 hierarchical_allocator_process.hpp:332] Added framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.353799 61259 sched.cpp:379] Scheduler::registered took 9.296us
> I1014 17:13:07.353881 61255 hierarchical_allocator_process.hpp:752] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201310141713-2081170186-58876-61234-0 to framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.354061 61255 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 212.58us
> I1014 17:13:07.354136 61262 master.hpp:389] Adding offer 201310141713-2081170186-58876-61234-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com)
> I1014 17:13:07.354249 61262 master.cpp:1689] Sending 1 offers to framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.354419 61260 sched.cpp:472] Scheduler::resourceOffers took 22.071us
> I1014 17:13:07.355800 61250 master.cpp:2015] Processing reply for offer 201310141713-2081170186-58876-61234-0 on slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com) for framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.355950 61250 master.hpp:361] Adding task 02786c98-93b9-4a26-ae6d-deefd2708b94 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com)
> I1014 17:13:07.356057 61250 master.cpp:2139] Launching task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com)
> I1014 17:13:07.356216 61257 slave.cpp:786] Got assigned task 02786c98-93b9-4a26-ae6d-deefd2708b94 for framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.356230 61250 master.hpp:399] Removing offer 201310141713-2081170186-58876-61234-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 201310141713-2081170186-58876-61234-0 (smfd-aki-27-sr1.devel.twitter.com)
> I1014 17:13:07.356366 61257 slave.cpp:2844] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/framework.info'
> I1014 17:13:07.356609 61257 slave.cpp:2851] Checkpointing framework pid 'scheduler(168)@10.35.12.124:58876' to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/framework.pid'
> I1014 17:13:07.356909 61257 slave.cpp:897] Launching task 02786c98-93b9-4a26-ae6d-deefd2708b94 for framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.358155 61257 slave.cpp:3094] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/executor.info'
> I1014 17:13:07.358564 61257 slave.cpp:3183] Checkpointing TaskInfo to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c/tasks/02786c98-93b9-4a26-ae6d-deefd2708b94/task.info'
> I1014 17:13:07.358561 61260 process_isolator.cpp:100] Launching 02786c98-93b9-4a26-ae6d-deefd2708b94 (/home/jyx/versions/mesos2/build/src/mesos-executor) in /tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c with resources ' for framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.358795 61257 slave.cpp:1008] Queuing task '02786c98-93b9-4a26-ae6d-deefd2708b94' for executor 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework '201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.359495 61260 process_isolator.cpp:163] Forked executor at 5601
> I1014 17:13:07.359587 61257 slave.cpp:529] Successfully attached file '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c'
> Killed the following process trees:
> [ 
> --- 5600 sleep 1000 
> ]
> Checkpointing executor's forked pid 5601 to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c/pids/forked.pid'
> Fetching resources into '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c'
> Command terminated with signal Killed (pid: 5393)
> I1014 17:13:07.413612  5382 exec.cpp:505] Executor sending status update TASK_KILLED (UUID: 7444fe93-fc88-4969-ba39-f00d1f7bfa52) for task 3775fcd5-4776-4e46-aacb-93db531757b1 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.413866 61264 process.cpp:986] Socket closed while receiving
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I1014 17:13:07.419656  5601 process.cpp:1555] libprocess is initialized on 10.35.12.124:46774 for 16 cpus
> I1014 17:13:07.422185  5645 exec.cpp:174] Executor started at: executor(1)@10.35.12.124:46774 with pid 5601
> I1014 17:13:07.422819 61251 slave.cpp:1460] Got registration for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.423228 61251 slave.cpp:1545] Checkpointing executor pid 'executor(1)@10.35.12.124:46774' to '/tmp/SlaveRecoveryTest_0_RegisterDisconnectedSlave_ne8Shz/meta/slaves/201310141713-2081170186-58876-61234-0/frameworks/201310141713-2081170186-58876-61234-0000/executors/02786c98-93b9-4a26-ae6d-deefd2708b94/runs/f6cb0202-06c2-47d2-838c-1a2d9d9e301c/pids/libprocess.pid'
> I1014 17:13:07.424352  5646 process.cpp:986] Socket closed while receiving
> I1014 17:13:07.424037 61251 slave.cpp:1581] Flushing queued task 02786c98-93b9-4a26-ae6d-deefd2708b94 for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.424475  5635 exec.cpp:198] Executor registered on slave 201310141713-2081170186-58876-61234-0
> I1014 17:13:07.424753  5646 process.cpp:986] Socket closed while receiving
> I1014 17:13:07.425571  5635 exec.cpp:210] Executor::registered took 101.155us
> Registered executor on smfd-aki-27-sr1.devel.twitter.com
> I1014 17:13:07.425681  5635 exec.cpp:285] Executor asked to run task '02786c98-93b9-4a26-ae6d-deefd2708b94'
> I1014 17:13:07.425745  5635 exec.cpp:294] Executor::launchTask took 38.183us
> Starting task 02786c98-93b9-4a26-ae6d-deefd2708b94
> sh -c 'sleep 1000'
> Forked command at 5647
> I1014 17:13:07.427450  5645 exec.cpp:505] Executor sending status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.427744 61253 slave.cpp:1793] Handling status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000 from executor(1)@10.35.12.124:46774
> I1014 17:13:07.427968 61253 status_update_manager.cpp:300] Received status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.428227 61253 status_update_manager.cpp:471] Creating StatusUpdate stream for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.428529 61253 status_update_manager.hpp:337] Checkpointing UPDATE for status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.530436 61253 status_update_manager.cpp:351] Forwarding status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000 to master@10.35.12.124:58876
> I1014 17:13:07.531108 61253 slave.cpp:1912] Status update manager successfully handled status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> ../../src/tests/slave_recovery_tests.cpp:1647: Failure
> Failed to wait 10secs for status
> I1014 17:13:07.531616 61253 slave.cpp:1918] Sending acknowledgement for status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000 to executor(1)@10.35.12.124:46774
> I1014 17:13:07.532291  5646 process.cpp:986] Socket closed while receiving
> I1014 17:13:07.532322  5635 exec.cpp:331] Executor received status update acknowledgement c94ea840-2bc9-40dc-8745-f87a5f0d7297 for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.531469 61256 sched.cpp:527] Scheduler::statusUpdate took 78.668us
> I1014 17:13:07.531134 61249 master.cpp:1448] Status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000 from slave(168)@10.35.12.124:58876
> I1014 17:13:07.532776 61255 status_update_manager.cpp:375] Received status update acknowledgement (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.533468 61255 status_update_manager.hpp:337] Checkpointing ACK for status update TASK_RUNNING (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.613468 61264 process.cpp:986] Socket closed while receiving
> Command terminated with signal Killed (pid: 5440)
> I1014 17:13:07.620054  5434 exec.cpp:505] Executor sending status update TASK_KILLED (UUID: 2b2abab7-2036-43c6-8fde-420b862e28df) for task 093ddb89-36cb-45bb-95ad-d2b952536065 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.622478 61255 slave.cpp:1406] Status update manager successfully handled status update acknowledgement (UUID: c94ea840-2bc9-40dc-8745-f87a5f0d7297) for task 02786c98-93b9-4a26-ae6d-deefd2708b94 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.800328 61264 process.cpp:986] Socket closed while receiving
> Command terminated with signal Killed (pid: 5487)
> I1014 17:13:07.836431  5479 exec.cpp:505] Executor sending status update TASK_KILLED (UUID: 6b714b63-2f0f-4d8c-9e11-03f6173275bf) for task 8c49c32e-a871-47c5-95bd-9a0c200862c1 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:07.994355 61264 process.cpp:986] Socket closed while receiving
> Command terminated with signal Killed (pid: 5534)
> I1014 17:13:08.032539  5528 exec.cpp:505] Executor sending status update TASK_KILLED (UUID: cccd6352-21c2-4c82-8b87-f852341bc220) for task ecdc0322-ae55-4a6d-ac8a-7c060ed791cf of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:08.210850 61264 process.cpp:986] Socket closed while receiving
> Command terminated with signal Killed (pid: 5600)
> I1014 17:13:08.233747  5583 exec.cpp:505] Executor sending status update TASK_KILLED (UUID: 410c0044-ca6b-4d5f-9278-4a97b23a47c8) for task 12bb99e8-61b2-4f12-91dc-aae39c535360 of framework 201310141713-2081170186-58876-61234-0000
> I1014 17:13:08.349792 61249 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 47.763us
> I1014 17:13:08.372403 61259 monitor.cpp:187] Publishing resource usage for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework '201310141713-2081170186-58876-61234-0000'
> I1014 17:13:08.414371 61264 process.cpp:986] Socket closed while receiving
> I1014 17:13:08.621309 61264 process.cpp:986] Socket closed while receiving
> I1014 17:13:08.837842 61264 process.cpp:986] Socket closed while receiving
> I1014 17:13:09.033443 61264 process.cpp:986] Socket closed while receiving
> I1014 17:13:09.235080 61264 process.cpp:986] Socket closed while receiving
> I1014 17:13:09.350921 61253 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 26.59us
> I1014 17:13:09.384407 61258 monitor.cpp:187] Publishing resource usage for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework '201310141713-2081170186-58876-61234-0000'
> I1014 17:13:10.352005 61252 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 57.152us
> I1014 17:13:10.396427 61258 monitor.cpp:187] Publishing resource usage for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework '201310141713-2081170186-58876-61234-0000'
> I1014 17:13:11.353577 61250 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 23.812us
> I1014 17:13:11.408164 61258 monitor.cpp:187] Publishing resource usage for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework '201310141713-2081170186-58876-61234-0000'
> I1014 17:13:12.349570 61262 master.cpp:85] No whitelist given. Advertising offers for all slaves
> I1014 17:13:12.355087 61260 hierarchical_allocator_process.hpp:688] Performed allocation for 1 slaves in 34.427us
> I1014 17:13:12.420876 61253 monitor.cpp:187] Publishing resource usage for executor '02786c98-93b9-4a26-ae6d-deefd2708b94' of framework '201310141713-2081170186-58876-61234-0000'
> ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)