You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Ian Downes (JIRA)" <ji...@apache.org> on 2015/04/06 20:24:12 UTC

[jira] [Resolved] (MESOS-1464) SlaveRecoveryTest/0.ShutdownSlave runs forever

     [ https://issues.apache.org/jira/browse/MESOS-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Downes resolved MESOS-1464.
-------------------------------
    Resolution: Cannot Reproduce

> SlaveRecoveryTest/0.ShutdownSlave runs forever
> ----------------------------------------------
>
>                 Key: MESOS-1464
>                 URL: https://issues.apache.org/jira/browse/MESOS-1464
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.19.0
>            Reporter: Yan Xu
>
> {noformat}
> [ RUN      ] SlaveRecoveryTest/0.ShutdownSlave
> Using temporary directory '/tmp/SlaveRecoveryTest_0_ShutdownSlave_7wX1Wv'
> I0606 18:38:59.725330 24987 leveldb.cpp:176] Opened db in 788125ns
> I0606 18:38:59.726125 24987 leveldb.cpp:183] Compacted db in 405518ns
> I0606 18:38:59.726558 24987 leveldb.cpp:198] Created db iterator in 18709ns
> I0606 18:38:59.727007 24987 leveldb.cpp:204] Seeked to beginning of db in 70836ns
> I0606 18:38:59.727603 24987 leveldb.cpp:273] Iterated through 0 keys in the db in 13260ns
> I0606 18:38:59.727993 24987 replica.cpp:741] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0606 18:38:59.728660 25007 recover.cpp:425] Starting replica recovery
> I0606 18:38:59.728736 25007 recover.cpp:451] Replica is in EMPTY status
> I0606 18:38:59.729061 25007 replica.cpp:638] Replica in EMPTY status received a broadcasted recover request
> I0606 18:38:59.729132 25007 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0606 18:38:59.729236 25007 recover.cpp:542] Updating replica status to STARTING
> I0606 18:38:59.729341 25007 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 55958ns
> I0606 18:38:59.729357 25007 replica.cpp:320] Persisted replica status to STARTING
> I0606 18:38:59.729409 25007 recover.cpp:451] Replica is in STARTING status
> I0606 18:38:59.729634 25007 replica.cpp:638] Replica in STARTING status received a broadcasted recover request
> I0606 18:38:59.729704 25007 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0606 18:38:59.729815 25007 recover.cpp:542] Updating replica status to VOTING
> I0606 18:38:59.729876 25007 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 22335ns
> I0606 18:38:59.729890 25007 replica.cpp:320] Persisted replica status to VOTING
> I0606 18:38:59.729919 25007 recover.cpp:556] Successfully joined the Paxos group
> I0606 18:38:59.729964 25007 recover.cpp:440] Recover process terminated
> I0606 18:38:59.734488 25005 master.cpp:272] Master 20140606-183859-2272962752-40636-24987 (fedora-19) started on 192.168.122.135:40636
> I0606 18:38:59.734532 25005 master.cpp:309] Master only allowing authenticated frameworks to register
> I0606 18:38:59.734540 25005 master.cpp:314] Master only allowing authenticated slaves to register
> I0606 18:38:59.734547 25005 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_7wX1Wv/credentials'
> I0606 18:38:59.734627 25005 master.cpp:340] Master enabling authorization
> I0606 18:38:59.735643 25009 hierarchical_allocator_process.hpp:301] Initializing hierarchical allocator process with master : master@192.168.122.135:40636
> I0606 18:38:59.735699 25011 master.cpp:108] No whitelist given. Advertising offers for all slaves
> I0606 18:38:59.736399 25005 master.cpp:957] The newly elected leader is master@192.168.122.135:40636 with id 20140606-183859-2272962752-40636-24987
> I0606 18:38:59.736423 25005 master.cpp:970] Elected as the leading master!
> I0606 18:38:59.736431 25005 master.cpp:788] Recovering from registrar
> I0606 18:38:59.736482 25005 registrar.cpp:313] Recovering registrar
> I0606 18:38:59.736695 25005 log.cpp:656] Attempting to start the writer
> I0606 18:38:59.737006 25005 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0606 18:38:59.737486 25005 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 471935ns
> I0606 18:38:59.737920 25005 replica.cpp:342] Persisted promised to 1
> I0606 18:38:59.738417 25005 coordinator.cpp:230] Coordinator attemping to fill missing position
> I0606 18:38:59.739327 25005 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0606 18:38:59.739951 25005 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 35555ns
> I0606 18:38:59.740437 25005 replica.cpp:676] Persisted action at 0
> I0606 18:38:59.744415 25010 replica.cpp:508] Replica received write request for position 0
> I0606 18:38:59.744458 25010 leveldb.cpp:438] Reading position from leveldb took 23012ns
> I0606 18:38:59.744488 25010 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took 20719ns
> I0606 18:38:59.744499 25010 replica.cpp:676] Persisted action at 0
> I0606 18:38:59.744623 25010 replica.cpp:655] Replica received learned notice for position 0
> I0606 18:38:59.744652 25010 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 19198ns
> I0606 18:38:59.744663 25010 replica.cpp:676] Persisted action at 0
> I0606 18:38:59.744669 25010 replica.cpp:661] Replica learned NOP action at position 0
> I0606 18:38:59.744842 25010 log.cpp:672] Writer started with ending position 0
> I0606 18:38:59.745101 25010 leveldb.cpp:438] Reading position from leveldb took 27519ns
> I0606 18:38:59.746109 25009 registrar.cpp:346] Successfully fetched the registry (0B)
> I0606 18:38:59.746136 25009 registrar.cpp:422] Attempting to update the 'registry'
> I0606 18:38:59.747542 25010 log.cpp:680] Attempting to append 131 bytes to the log
> I0606 18:38:59.748111 25007 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 1
> I0606 18:38:59.748301 25007 replica.cpp:508] Replica received write request for position 1
> I0606 18:38:59.748342 25007 leveldb.cpp:343] Persisting action (150 bytes) to leveldb took 27852ns
> I0606 18:38:59.748353 25007 replica.cpp:676] Persisted action at 1
> I0606 18:38:59.748498 25007 replica.cpp:655] Replica received learned notice for position 1
> I0606 18:38:59.748530 25007 leveldb.cpp:343] Persisting action (152 bytes) to leveldb took 22953ns
> I0606 18:38:59.748541 25007 replica.cpp:676] Persisted action at 1
> I0606 18:38:59.748548 25007 replica.cpp:661] Replica learned APPEND action at position 1
> I0606 18:38:59.749064 25010 registrar.cpp:479] Successfully updated 'registry'
> I0606 18:38:59.749287 25007 log.cpp:699] Attempting to truncate the log to 1
> I0606 18:38:59.749758 25006 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 2
> I0606 18:38:59.749932 25006 replica.cpp:508] Replica received write request for position 2
> I0606 18:38:59.749965 25006 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 23310ns
> I0606 18:38:59.749977 25006 replica.cpp:676] Persisted action at 2
> I0606 18:38:59.750231 25006 replica.cpp:655] Replica received learned notice for position 2
> I0606 18:38:59.750263 25006 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 23049ns
> I0606 18:38:59.750280 25006 leveldb.cpp:401] Deleting ~1 keys from leveldb took 10864ns
> I0606 18:38:59.750289 25006 replica.cpp:676] Persisted action at 2
> I0606 18:38:59.750298 25006 replica.cpp:661] Replica learned TRUNCATE action at position 2
> I0606 18:38:59.750797 25010 registrar.cpp:372] Successfully recovered registrar
> I0606 18:38:59.754741 24987 mesos_containerizer.cpp:124] Using isolation: posix/cpu,posix/mem
> I0606 18:38:59.755175 25006 master.cpp:815] Recovered 0 slaves from the Registry (95B) ; allowing 10mins for slaves to re-register
> I0606 18:38:59.758690 25011 slave.cpp:143] Slave started on 53)@192.168.122.135:40636
> I0606 18:38:59.758719 25011 credentials.hpp:35] Loading credentials for authentication from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/credential'
> I0606 18:38:59.758790 25011 slave.cpp:243] Slave using credential for: test-principal
> I0606 18:38:59.758880 25011 slave.cpp:256] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0606 18:38:59.759521 25011 slave.cpp:284] Slave hostname: fedora-19
> I0606 18:38:59.759539 25011 slave.cpp:285] Slave checkpoint: true
> I0606 18:38:59.760252 25011 state.cpp:33] Recovering state from '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta'
> I0606 18:38:59.760339 25011 status_update_manager.cpp:193] Recovering status update manager
> I0606 18:38:59.760387 25011 mesos_containerizer.cpp:281] Recovering containerizer
> I0606 18:38:59.760546 25011 slave.cpp:3024] Finished recovery
> I0606 18:38:59.760848 25011 slave.cpp:537] New master detected at master@192.168.122.135:40636
> I0606 18:38:59.760875 25011 slave.cpp:613] Authenticating with master master@192.168.122.135:40636
> I0606 18:38:59.760916 25011 slave.cpp:586] Detecting new master
> I0606 18:38:59.760947 25011 status_update_manager.cpp:167] New master detected at master@192.168.122.135:40636
> I0606 18:38:59.760979 25011 authenticatee.hpp:128] Creating new client SASL connection
> I0606 18:38:59.761829 25011 master.cpp:2966] Authenticating slave(53)@192.168.122.135:40636
> I0606 18:38:59.762437 25011 authenticator.hpp:156] Creating new server SASL connection
> I0606 18:38:59.763381 25005 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0606 18:38:59.763427 25005 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0606 18:38:59.763705 25011 authenticator.hpp:262] Received SASL authentication start
> I0606 18:38:59.764219 25011 authenticator.hpp:384] Authentication requires more steps
> I0606 18:38:59.764729 25010 authenticatee.hpp:265] Received SASL authentication step
> I0606 18:38:59.765012 25011 authenticator.hpp:290] Received SASL authentication step
> I0606 18:38:59.765466 25011 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0606 18:38:59.765903 25011 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0606 18:38:59.766316 25011 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0606 18:38:59.766685 25011 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0606 18:38:59.767141 25011 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0606 18:38:59.767516 25011 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0606 18:38:59.767892 25011 authenticator.hpp:376] Authentication success
> I0606 18:38:59.768312 25007 authenticatee.hpp:305] Authentication success
> I0606 18:38:59.773574 25008 master.cpp:3006] Successfully authenticated principal 'test-principal' at slave(53)@192.168.122.135:40636
> I0606 18:38:59.775130 25007 slave.cpp:670] Successfully authenticated with master master@192.168.122.135:40636
> I0606 18:38:59.775876 24987 sched.cpp:128] Version: 0.20.0
> I0606 18:38:59.776198 25009 master.cpp:2282] Registering slave at slave(53)@192.168.122.135:40636 (fedora-19) with id 20140606-183859-2272962752-40636-24987-0
> I0606 18:38:59.776563 25009 registrar.cpp:422] Attempting to update the 'registry'
> I0606 18:38:59.777896 25009 log.cpp:680] Attempting to append 312 bytes to the log
> I0606 18:38:59.778344 25011 coordinator.cpp:340] Coordinator attempting to write APPEND action at position 3
> I0606 18:38:59.778573 25011 replica.cpp:508] Replica received write request for position 3
> I0606 18:38:59.778632 25011 leveldb.cpp:343] Persisting action (331 bytes) to leveldb took 45885ns
> I0606 18:38:59.778646 25011 replica.cpp:676] Persisted action at 3
> I0606 18:38:59.779242 25010 replica.cpp:655] Replica received learned notice for position 3
> I0606 18:38:59.779285 25010 leveldb.cpp:343] Persisting action (333 bytes) to leveldb took 27351ns
> I0606 18:38:59.779311 25010 replica.cpp:676] Persisted action at 3
> I0606 18:38:59.779321 25010 replica.cpp:661] Replica learned APPEND action at position 3
> I0606 18:38:59.782156 25005 registrar.cpp:479] Successfully updated 'registry'
> I0606 18:38:59.782237 25005 master.cpp:2322] Registered slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19)
> I0606 18:38:59.782253 25005 master.cpp:3452] Adding slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0606 18:38:59.782378 25005 hierarchical_allocator_process.hpp:444] Added slave 20140606-183859-2272962752-40636-24987-0 (fedora-19) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0606 18:38:59.782421 25005 hierarchical_allocator_process.hpp:707] Performed allocation for slave 20140606-183859-2272962752-40636-24987-0 in 17098ns
> I0606 18:38:59.782996 25012 log.cpp:699] Attempting to truncate the log to 3
> I0606 18:38:59.783079 25012 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 4
> I0606 18:38:59.783355 25012 replica.cpp:508] Replica received write request for position 4
> I0606 18:38:59.783396 25012 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 26490ns
> I0606 18:38:59.783408 25012 replica.cpp:676] Persisted action at 4
> I0606 18:38:59.783545 25012 replica.cpp:655] Replica received learned notice for position 4
> I0606 18:38:59.783577 25012 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 21641ns
> I0606 18:38:59.783597 25012 leveldb.cpp:401] Deleting ~2 keys from leveldb took 13425ns
> I0606 18:38:59.783607 25012 replica.cpp:676] Persisted action at 4
> I0606 18:38:59.783617 25012 replica.cpp:661] Replica learned TRUNCATE action at position 4
> I0606 18:38:59.776222 25007 slave.cpp:908] Will retry registration in 8.32368ms if necessary
> I0606 18:38:59.787597 25007 slave.cpp:704] Registered with master master@192.168.122.135:40636; given slave ID 20140606-183859-2272962752-40636-24987-0
> I0606 18:38:59.787914 25008 sched.cpp:224] New master detected at master@192.168.122.135:40636
> I0606 18:38:59.788280 25008 sched.cpp:274] Authenticating with master master@192.168.122.135:40636
> I0606 18:38:59.788406 25008 authenticatee.hpp:128] Creating new client SASL connection
> I0606 18:38:59.788957 25008 master.cpp:2966] Authenticating scheduler(46)@192.168.122.135:40636
> I0606 18:38:59.789103 25005 authenticator.hpp:156] Creating new server SASL connection
> I0606 18:38:59.789572 25005 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0606 18:38:59.789598 25005 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0606 18:38:59.789623 25005 authenticator.hpp:262] Received SASL authentication start
> I0606 18:38:59.789661 25005 authenticator.hpp:384] Authentication requires more steps
> I0606 18:38:59.789687 25005 authenticatee.hpp:265] Received SASL authentication step
> I0606 18:38:59.789722 25005 authenticator.hpp:290] Received SASL authentication step
> I0606 18:38:59.789739 25005 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: false 
> I0606 18:38:59.789748 25005 auxprop.cpp:153] Looking up auxiliary property '*userPassword'
> I0606 18:38:59.789760 25005 auxprop.cpp:153] Looking up auxiliary property '*cmusaslsecretCRAM-MD5'
> I0606 18:38:59.789769 25005 auxprop.cpp:81] Request to lookup properties for user: 'test-principal' realm: 'fedora-19' server FQDN: 'fedora-19' SASL_AUXPROP_VERIFY_AGAINST_HASH: false SASL_AUXPROP_OVERRIDE: false SASL_AUXPROP_AUTHZID: true 
> I0606 18:38:59.789777 25005 auxprop.cpp:103] Skipping auxiliary property '*userPassword' since SASL_AUXPROP_AUTHZID == true
> I0606 18:38:59.789783 25005 auxprop.cpp:103] Skipping auxiliary property '*cmusaslsecretCRAM-MD5' since SASL_AUXPROP_AUTHZID == true
> I0606 18:38:59.789793 25005 authenticator.hpp:376] Authentication success
> I0606 18:38:59.789819 25005 authenticatee.hpp:305] Authentication success
> I0606 18:38:59.789841 25005 master.cpp:3006] Successfully authenticated principal 'test-principal' at scheduler(46)@192.168.122.135:40636
> I0606 18:38:59.789906 25005 sched.cpp:348] Successfully authenticated with master master@192.168.122.135:40636
> I0606 18:38:59.789921 25005 sched.cpp:467] Sending registration request to master@192.168.122.135:40636
> I0606 18:38:59.789963 25005 master.cpp:1048] Received registration request from scheduler(46)@192.168.122.135:40636
> I0606 18:38:59.789994 25005 master.cpp:1066] Registering framework 20140606-183859-2272962752-40636-24987-0000 at scheduler(46)@192.168.122.135:40636
> I0606 18:38:59.790093 25008 sched.cpp:398] Framework registered with 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.790118 25008 sched.cpp:412] Scheduler::registered took 12838ns
> I0606 18:38:59.790499 25005 hierarchical_allocator_process.hpp:331] Added framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.790542 25005 hierarchical_allocator_process.hpp:751] Offering cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 to framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.790628 25005 hierarchical_allocator_process.hpp:687] Performed allocation for 1 slaves in 115789ns
> I0606 18:38:59.790678 25005 master.hpp:693] Adding offer 20140606-183859-2272962752-40636-24987-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 (fedora-19)
> I0606 18:38:59.790717 25005 master.cpp:2913] Sending 1 offers to framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.791182 25005 sched.cpp:535] Scheduler::resourceOffers took 25837ns
> I0606 18:38:59.792461 25007 slave.cpp:717] Checkpointing SlaveInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/slave.info'
> I0606 18:38:59.793648 25006 master.hpp:703] Removing offer 20140606-183859-2272962752-40636-24987-0 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 (fedora-19)
> I0606 18:38:59.793702 25006 master.cpp:1869] Processing reply for offers: [ 20140606-183859-2272962752-40636-24987-0 ] on slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19) for framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.793773 25006 master.hpp:665] Adding task de849715-bebc-44f6-b743-b9936ba4c0a4 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 (fedora-19)
> I0606 18:38:59.793812 25006 master.cpp:3091] Launching task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19)
> I0606 18:38:59.793915 25006 slave.cpp:939] Got assigned task de849715-bebc-44f6-b743-b9936ba4c0a4 for framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.793983 25006 slave.cpp:3296] Checkpointing FrameworkInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/framework.info'
> I0606 18:38:59.794581 25006 slave.cpp:3303] Checkpointing framework pid 'scheduler(46)@192.168.122.135:40636' to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/framework.pid'
> I0606 18:38:59.794858 25006 slave.cpp:1049] Launching task de849715-bebc-44f6-b743-b9936ba4c0a4 for framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.796583 25006 slave.cpp:3618] Checkpointing ExecutorInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/executor.info'
> I0606 18:38:59.797668 25008 mesos_containerizer.cpp:537] Starting container '6266edf5-4bbe-40a4-a09d-358472cb8041' for executor 'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework '20140606-183859-2272962752-40636-24987-0000'
> I0606 18:38:59.800356 25012 launcher.cpp:117] Forked child with pid '25411' for container '6266edf5-4bbe-40a4-a09d-358472cb8041'
> I0606 18:38:59.800930 25012 mesos_containerizer.cpp:803] Checkpointing executor's forked pid 25411 to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/pids/forked.pid'
> I0606 18:38:59.802824 25012 mesos_containerizer.cpp:647] Fetching URIs for container '6266edf5-4bbe-40a4-a09d-358472cb8041' using command '/var/jenkins/workspace/mesos-fedora-19-clang/src/mesos-fetcher'
> I0606 18:38:59.803532 25006 slave.cpp:3733] Checkpointing TaskInfo to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/tasks/de849715-bebc-44f6-b743-b9936ba4c0a4/task.info'
> I0606 18:38:59.812248 25006 slave.cpp:1159] Queuing task 'de849715-bebc-44f6-b743-b9936ba4c0a4' for executor de849715-bebc-44f6-b743-b9936ba4c0a4 of framework '20140606-183859-2272962752-40636-24987-0000
> I0606 18:38:59.812863 25006 slave.cpp:498] Successfully attached file '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041'
> I0606 18:39:00.736346 25005 hierarchical_allocator_process.hpp:687] Performed allocation for 1 slaves in 139472ns
> I0606 18:39:00.737416 25010 slave.cpp:2361] Monitoring executor 'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework '20140606-183859-2272962752-40636-24987-0000' in container '6266edf5-4bbe-40a4-a09d-358472cb8041'
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0606 18:39:00.826778 25411 process.cpp:1671] libprocess is initialized on 192.168.122.135:39003 for 8 cpus
> I0606 18:39:00.829923 25411 exec.cpp:131] Version: 0.20.0
> I0606 18:39:00.832566 25449 exec.cpp:181] Executor started at: executor(1)@192.168.122.135:39003 with pid 25411
> I0606 18:39:00.833638 25010 slave.cpp:1670] Got registration for executor 'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.833713 25010 slave.cpp:1755] Checkpointing executor pid 'executor(1)@192.168.122.135:39003' to '/tmp/SlaveRecoveryTest_0_ShutdownSlave_83ZbbJ/meta/slaves/20140606-183859-2272962752-40636-24987-0/frameworks/20140606-183859-2272962752-40636-24987-0000/executors/de849715-bebc-44f6-b743-b9936ba4c0a4/runs/6266edf5-4bbe-40a4-a09d-358472cb8041/pids/libprocess.pid'
> I0606 18:39:00.834895 25010 slave.cpp:1789] Flushing queued task de849715-bebc-44f6-b743-b9936ba4c0a4 for executor 'de849715-bebc-44f6-b743-b9936ba4c0a4' of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.835798 25447 exec.cpp:205] Executor registered on slave 20140606-183859-2272962752-40636-24987-0
> I0606 18:39:00.836727 25447 exec.cpp:217] Executor::registered took 111641ns
> Registered executor on fedora-19
> I0606 18:39:00.837342 25013 process.cpp:1098] Socket closed while receiving
> I0606 18:39:00.838347 25445 exec.cpp:292] Executor asked to run task 'de849715-bebc-44f6-b743-b9936ba4c0a4'
> I0606 18:39:00.838433 25445 exec.cpp:301] Executor::launchTask took 69226ns
> Starting task de849715-bebc-44f6-b743-b9936ba4c0a4
> I0606 18:39:00.843193 25450 process.cpp:1037] Socket closed while receiving
> sh -c 'sleep 1000'
> Forked command at 25451
> I0606 18:39:00.854920 25449 exec.cpp:524] Executor sending status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.855803 25010 slave.cpp:2024] Handling status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 from executor(1)@192.168.122.135:39003
> I0606 18:39:00.855929 25010 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.855947 25010 status_update_manager.cpp:499] Creating StatusUpdate stream for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.856501 25010 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.856595 25010 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 to master@192.168.122.135:40636
> I0606 18:39:00.856740 25010 master.cpp:2608] Status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 from slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19)
> I0606 18:39:00.856777 25010 slave.cpp:2145] Status update manager successfully handled status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.856789 25010 slave.cpp:2151] Sending acknowledgement for status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 to executor(1)@192.168.122.135:39003
> I0606 18:39:00.857117 25005 sched.cpp:626] Scheduler::statusUpdate took 46862ns
> I0606 18:39:00.857756 25444 exec.cpp:338] Executor received status update acknowledgement 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.858286 25445 exec.cpp:378] Executor asked to shutdown
> I0606 18:39:00.858337 25445 exec.cpp:393] Executor::shutdown took 9726ns
> I0606 18:39:00.858360 25445 exec.cpp:77] Scheduling shutdown of the executor
> Shutting down
> Sending SIGTERM to process tree at pid 25451
> I0606 18:39:00.862169 25005 hierarchical_allocator_process.hpp:687] Performed allocation for 1 slaves in 41161ns
> I0606 18:39:00.866359 25013 process.cpp:1098] Socket closed while receiving
> I0606 18:39:00.867178 25010 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.867209 25010 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_RUNNING (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.867286 25010 slave.cpp:1610] Status update manager successfully handled status update acknowledgement (UUID: 7012f4e1-15e6-44c2-ad42-61dd82bdd5ee) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:00.867454 25450 process.cpp:1037] Socket closed while receiving
> Killing the following process trees:
> [ 
> --- 25451 sleep 1000 
> ]
> Command terminated with signal Terminated (pid: 25451)
> I0606 18:39:01.856297 25444 exec.cpp:524] Executor sending status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.857009 25006 slave.cpp:2024] Handling status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 from executor(1)@192.168.122.135:39003
> I0606 18:39:01.857255 25006 slave.cpp:3666] Terminating task de849715-bebc-44f6-b743-b9936ba4c0a4
> I0606 18:39:01.857662 25006 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.857693 25006 status_update_manager.hpp:342] Checkpointing UPDATE for status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.857761 25006 status_update_manager.cpp:373] Forwarding status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 to master@192.168.122.135:40636
> I0606 18:39:01.857985 25006 master.cpp:2608] Status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 from slave 20140606-183859-2272962752-40636-24987-0 at slave(53)@192.168.122.135:40636 (fedora-19)
> I0606 18:39:01.858161 25010 slave.cpp:2145] Status update manager successfully handled status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.858270 25010 slave.cpp:2151] Sending acknowledgement for status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 to executor(1)@192.168.122.135:39003
> I0606 18:39:01.858680 25008 sched.cpp:626] Scheduler::statusUpdate took 34644ns
> I0606 18:39:01.859246 25010 status_update_manager.cpp:398] Received status update acknowledgement (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.859273 25010 status_update_manager.hpp:342] Checkpointing ACK for status update TASK_KILLED (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.859314 25010 status_update_manager.cpp:530] Cleaning up status update stream for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.859410 25010 slave.cpp:1610] Status update manager successfully handled status update acknowledgement (UUID: 9ee11dba-1cd6-496f-947f-0054329a6141) for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000
> I0606 18:39:01.859427 25010 slave.cpp:3708] Completing task de849715-bebc-44f6-b743-b9936ba4c0a4
> I0606 18:39:01.859716 25444 exec.cpp:331] Ignoring status update acknowledgement 9ee11dba-1cd6-496f-947f-0054329a6141 for task de849715-bebc-44f6-b743-b9936ba4c0a4 of framework 20140606-183859-2272962752-40636-24987-0000 because the driver is aborted!
> I0606 18:39:01.859803 25013 process.cpp:1098] Socket closed while receiving
> I0606 18:39:01.859869 25450 process.cpp:1037] Socket closed while receiving
> I0606 18:39:01.860499 25006 master.hpp:683] Removing task de849715-bebc-44f6-b743-b9936ba4c0a4 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140606-183859-2272962752-40636-24987-0 (fedora-19)
> I0606 18:39:01.860682 25006 hierarchical_allocator_process.hpp:636] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20140606-183859-2272962752-40636-24987-0 from framework 20140606-183859-2272962752-40636-24987-0000
> 2014-06-06 18:39:01,948:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0606 18:39:02.857594 25013 process.cpp:1037] Socket closed while receiving
> 2014-06-06 18:39:05,284:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:08,620:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:11,956:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:15,292:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:18,628:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:21,965:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:25,301:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:28,637:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-06-06 18:39:31,973:24987(0x7f4e115ce700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60899] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)