You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Adam B (JIRA)" <ji...@apache.org> on 2014/05/02 20:21:16 UTC
[jira] [Commented] (MESOS-1264) Slave authentication retries can
trigger TASK_LOST for non-checkpointing frameworks.
[ https://issues.apache.org/jira/browse/MESOS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988042#comment-13988042 ]
Adam B commented on MESOS-1264:
-------------------------------
Fix up for review: https://reviews.apache.org/r/21017/
> Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.
> ------------------------------------------------------------------------------------
>
> Key: MESOS-1264
> URL: https://issues.apache.org/jira/browse/MESOS-1264
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.19.0
> Reporter: Benjamin Mahler
> Assignee: Adam B
> Fix For: 0.19.0
>
>
> Looks like there is a regression with slave authentication that is making the FaultToleranceTest.ReconcileIncompleteTasks flaky.
> The following is what appears to be happening in the test and seems to be a regression:
> 1. Slave re-detects leading Master.
> 2. Slave re-authenticates with Master.
> 3. Master sees slave as already activated, calls disconnect().
> 4. For non-checkpointing frameworks, this call to disconnect() assumes the slave has exited, and will send TASK_LOST.
> 5. In the case where the slave did not exit, the tasks are not lost.
> Either we can consider this case to be valid as TASK_LOST, similarly to when the slave connection closes in Master::exited(UPID), updating the test as necessary.
> Or we can try to avoid sending TASK_LOST for authentication retries.
> {noformat}
> [ RUN ] FaultToleranceTest.ReconcileIncompleteTasks
> Using temporary directory '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P'
> I0428 22:54:27.878968 533 leveldb.cpp:174] Opened db in 152.905109ms
> I0428 22:54:27.901624 533 leveldb.cpp:181] Compacted db in 22.554687ms
> I0428 22:54:27.901720 533 leveldb.cpp:196] Created db iterator in 5625ns
> I0428 22:54:27.901761 533 leveldb.cpp:202] Seeked to beginning of db in 930ns
> I0428 22:54:27.901922 533 leveldb.cpp:271] Iterated through 0 keys in the db in 363ns
> I0428 22:54:27.901958 533 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0428 22:54:27.902427 559 recover.cpp:425] Starting replica recovery
> I0428 22:54:27.902935 561 recover.cpp:451] Replica is in EMPTY status
> I0428 22:54:27.904425 556 master.cpp:266] Master 20140428-225427-1740121354-35964-533 (HOSTNAME) started on 10.37.184.103:35964
> I0428 22:54:27.904470 556 master.cpp:303] Master only allowing authenticated frameworks to register
> I0428 22:54:27.904491 556 master.cpp:308] Master only allowing authenticated slaves to register
> I0428 22:54:27.904515 556 credentials.hpp:35] Loading credentials for authentication
> I0428 22:54:27.904561 551 replica.cpp:626] Replica in EMPTY status received a broadcasted recover request
> W0428 22:54:27.904619 556 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P/credentials': No such file or directory
> I0428 22:54:27.904887 550 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0428 22:54:27.905383 568 recover.cpp:542] Updating replica status to STARTING
> I0428 22:54:27.906673 558 master.cpp:922] The newly elected leader is master@10.37.184.103:35964 with id 20140428-225427-1740121354-35964-533
> I0428 22:54:27.906708 558 master.cpp:932] Elected as the leading master!
> I0428 22:54:27.906728 558 master.cpp:753] Recovering from registrar
> I0428 22:54:27.906842 556 registrar.cpp:275] Recovering registrar
> I0428 22:54:27.937743 561 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 31.905754ms
> I0428 22:54:27.937796 561 replica.cpp:320] Persisted replica status to STARTING
> I0428 22:54:27.938019 561 recover.cpp:451] Replica is in STARTING status
> I0428 22:54:27.939944 569 replica.cpp:626] Replica in STARTING status received a broadcasted recover request
> I0428 22:54:27.940160 565 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0428 22:54:27.941345 558 recover.cpp:542] Updating replica status to VOTING
> I0428 22:54:27.962074 553 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 20.298725ms
> I0428 22:54:27.962128 553 replica.cpp:320] Persisted replica status to VOTING
> I0428 22:54:27.962266 553 recover.cpp:556] Successfully joined the Paxos group
> I0428 22:54:27.962432 553 recover.cpp:440] Recover process terminated
> I0428 22:54:27.962946 566 log.cpp:656] Attempting to start the writer
> I0428 22:54:27.964815 547 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0428 22:54:27.970388 547 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 5.516929ms
> I0428 22:54:27.970440 547 replica.cpp:342] Persisted promised to 1
> I0428 22:54:27.971122 569 coordinator.cpp:229] Coordinator attemping to fill missing position
> I0428 22:54:27.972815 570 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0428 22:54:27.978729 570 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 5.798681ms
> I0428 22:54:27.978806 570 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.980425 547 replica.cpp:508] Replica received write request for position 0
> I0428 22:54:27.980496 547 leveldb.cpp:436] Reading position from leveldb took 21889ns
> I0428 22:54:27.987072 547 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 6.529569ms
> I0428 22:54:27.987125 547 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.987746 559 replica.cpp:643] Replica received learned notice for position 0
> I0428 22:54:27.995412 559 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.515296ms
> I0428 22:54:27.995465 559 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.995499 559 replica.cpp:649] Replica learned NOP action at position 0
> I0428 22:54:27.996111 560 log.cpp:672] Writer started with ending position 0
> I0428 22:54:27.997576 562 leveldb.cpp:436] Reading position from leveldb took 23701ns
> I0428 22:54:28.000907 550 registrar.cpp:308] Successfully recovered registrar
> I0428 22:54:28.001000 550 registrar.cpp:379] Attempting to update the 'registry'
> I0428 22:54:28.003427 567 log.cpp:680] Attempting to append 153 bytes to the log
> I0428 22:54:28.003566 548 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1
> I0428 22:54:28.004590 552 replica.cpp:508] Replica received write request for position 1
> I0428 22:54:28.079499 552 leveldb.cpp:341] Persisting action (172 bytes) to leveldb took 74.874271ms
> I0428 22:54:28.079545 552 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.080243 548 replica.cpp:643] Replica received learned notice for position 1
> I0428 22:54:28.103667 548 leveldb.cpp:341] Persisting action (174 bytes) to leveldb took 23.258596ms
> I0428 22:54:28.103731 548 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.103847 548 replica.cpp:649] Replica learned APPEND action at position 1
> I0428 22:54:28.105062 568 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.105399 561 log.cpp:699] Attempting to truncate the log to 1
> I0428 22:54:28.105626 559 master.cpp:780] Recovered 0 slaves from the Registry (115B) ; allowing 10mins for slaves to re-register
> I0428 22:54:28.105702 547 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2
> I0428 22:54:28.107192 548 replica.cpp:508] Replica received write request for position 2
> I0428 22:54:28.110117 547 slave.cpp:140] Slave started on 4)@10.37.184.103:35964
> I0428 22:54:28.110178 547 slave.cpp:149] Moving slave process into its own cgroup
> I0428 22:54:28.111990 548 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 4.755207ms
> I0428 22:54:28.112031 548 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.112576 558 replica.cpp:643] Replica received learned notice for position 2
> I0428 22:54:28.113647 533 sched.cpp:121] Version: 0.19.0
> I0428 22:54:28.114310 549 sched.cpp:217] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.114351 549 sched.cpp:268] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.114554 561 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.114681 549 master.cpp:2795] Authenticating scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.114882 570 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.115015 570 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.115047 570 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.115114 570 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.115195 570 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.115279 548 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.115485 556 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.115576 556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.115681 555 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.115736 549 master.cpp:2835] Successfully authenticated scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116276 566 sched.cpp:342] Successfully authenticated with master master@10.37.184.103:35964
> I0428 22:54:28.116435 561 master.cpp:981] Received registration request from scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116557 561 master.cpp:999] Registering framework 20140428-225427-1740121354-35964-533-0000 at scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116726 552 sched.cpp:392] Framework registered with 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.116801 561 hierarchical_allocator_process.hpp:332] Added framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.120318 558 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.6976ms
> I0428 22:54:28.120383 558 leveldb.cpp:399] Deleting ~1 keys from leveldb took 18353ns
> I0428 22:54:28.120409 558 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.120434 558 replica.cpp:649] Replica learned TRUNCATE action at position 2
> I0428 22:54:28.135422 547 slave.cpp:149] Moving slave process into its own cgroup
> I0428 22:54:28.155840 547 credentials.hpp:35] Loading credentials for authentication
> W0428 22:54:28.155907 547 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/credential': No such file or directory
> I0428 22:54:28.155962 547 slave.cpp:231] Slave using credential for: test-principal
> I0428 22:54:28.156137 547 slave.cpp:244] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.156743 547 slave.cpp:272] Slave hostname: HOSTNAME
> I0428 22:54:28.156797 547 slave.cpp:273] Slave checkpoint: false
> I0428 22:54:28.157765 567 state.cpp:33] Recovering state from '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/meta'
> I0428 22:54:28.158015 564 status_update_manager.cpp:193] Recovering status update manager
> I0428 22:54:28.158342 548 slave.cpp:2943] Finished recovery
> I0428 22:54:28.158860 566 slave.cpp:525] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.159013 566 slave.cpp:585] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.159083 548 status_update_manager.cpp:167] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.159139 566 slave.cpp:558] Detecting new master
> I0428 22:54:28.159340 567 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.159664 567 master.cpp:2795] Authenticating slave(4)@10.37.184.103:35964
> I0428 22:54:28.159989 559 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.160220 548 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.160253 548 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.160328 548 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.160426 548 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.160490 548 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.160568 548 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.160673 548 authenticator.hpp:334] Authentication success
> I0428 22:54:28.160758 558 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.160787 548 master.cpp:2835] Successfully authenticated slave(4)@10.37.184.103:35964
> I0428 22:54:28.161053 570 slave.cpp:642] Successfully authenticated with master master@10.37.184.103:35964
> I0428 22:54:28.161638 556 registrar.cpp:379] Attempting to update the 'registry'
> I0428 22:54:28.164188 570 log.cpp:680] Attempting to append 378 bytes to the log
> I0428 22:54:28.164309 558 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3
> I0428 22:54:28.165631 569 replica.cpp:508] Replica received write request for position 3
> I0428 22:54:28.186933 569 leveldb.cpp:341] Persisting action (397 bytes) to leveldb took 21.263363ms
> I0428 22:54:28.186987 569 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.187626 555 replica.cpp:643] Replica received learned notice for position 3
> I0428 22:54:28.195307 555 leveldb.cpp:341] Persisting action (399 bytes) to leveldb took 7.628475ms
> I0428 22:54:28.195361 555 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.195397 555 replica.cpp:649] Replica learned APPEND action at position 3
> I0428 22:54:28.196445 547 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.196738 553 log.cpp:699] Attempting to truncate the log to 3
> I0428 22:54:28.197031 552 master.cpp:2169] Admitted slave on HOSTNAME at slave(4)@IP:35964
> I0428 22:54:28.197108 552 master.cpp:3283] Adding slave 20140428-225427-1740121354-35964-533-0 at HOST with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.196979 568 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4
> I0428 22:54:28.197506 560 slave.cpp:675] Registered with master master@10.37.184.103:35964; given slave ID 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.197764 547 hierarchical_allocator_process.hpp:445] Added slave 20140428-225427-1740121354-35964-533-0 (HOST) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0428 22:54:28.198402 565 master.cpp:2744] Sending 1 offers to framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.199038 568 replica.cpp:508] Replica received write request for position 4
> I0428 22:54:28.200683 548 master.cpp:1806] Processing reply for offers: [ 20140428-225427-1740121354-35964-533-0 ] on slave 20140428-225427-1740121354-35964-533-0 (HOST) for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201052 548 master.hpp:558] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201151 548 master.cpp:2919] Launching task 1 of framework 20140428-225427-1740121354-35964-533-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201370 555 slave.cpp:905] Got assigned task 1 for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201853 555 slave.cpp:1015] Launching task 1 for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.203640 568 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 4.423216ms
> I0428 22:54:28.203698 568 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.204627 547 replica.cpp:643] Replica received learned notice for position 4
> I0428 22:54:28.206099 555 exec.cpp:131] Version: 0.19.0
> I0428 22:54:28.206481 555 slave.cpp:1125] Queuing task '1' for executor default of framework '20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.206670 555 slave.cpp:2282] Monitoring executor 'default' of framework '20140428-225427-1740121354-35964-533-0000' in container '77e0cf10-1c8d-47a7-946d-eb56f26e339d'
> I0428 22:54:28.206925 555 slave.cpp:1598] Got registration for executor 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207355 555 slave.cpp:1717] Flushing queued task 1 for executor 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207478 557 exec.cpp:205] Executor registered on slave 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.211570 557 slave.cpp:1953] Handling status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from executor(4)@10.37.184.103:35964
> I0428 22:54:28.211958 547 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.29439ms
> I0428 22:54:28.211987 559 status_update_manager.cpp:320] Received status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.212195 547 leveldb.cpp:399] Deleting ~2 keys from leveldb took 56641ns
> I0428 22:54:28.212311 547 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.212358 547 replica.cpp:649] Replica learned TRUNCATE action at position 4
> I0428 22:54:28.212357 559 status_update_manager.cpp:373] Forwarding status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to master@10.37.184.103:35964
> I0428 22:54:28.212858 559 slave.cpp:2076] Sending acknowledgement for status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to executor(4)@10.37.184.103:35964
> I0428 22:54:28.222946 565 slave.cpp:525] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.223008 565 slave.cpp:585] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.223079 564 status_update_manager.cpp:167] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.223125 565 slave.cpp:558] Detecting new master
> W0428 22:54:28.223120 564 status_update_manager.cpp:181] Resending status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.223142 568 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.223157 564 status_update_manager.cpp:373] Forwarding status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to master@10.37.184.103:35964
> I0428 22:54:28.223357 565 master.cpp:1263] Disconnecting slave 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.223460 566 hierarchical_allocator_process.hpp:484] Slave 20140428-225427-1740121354-35964-533-0 disconnected
> I0428 22:54:28.223486 565 master.cpp:1283] Removing non-checkpointing framework 20140428-225427-1740121354-35964-533-0000 from disconnected slave 20140428-225427-1740121354-35964-533-0(HOST)
> I0428 22:54:28.223505 565 master.cpp:3235] Removing framework 20140428-225427-1740121354-35964-533-0000 from slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225428 565 master.cpp:2444] Status update TASK_LOST (UUID: dfa02d31-0a93-4808-924c-0969412c7e52) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from @0.0.0.0:0
> I0428 22:54:28.225581 565 master.hpp:576] Removing task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225934 565 master.cpp:2795] Authenticating slave(4)@10.37.184.103:35964
> I0428 22:54:28.225975 564 hierarchical_allocator_process.hpp:637] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20140428-225427-1740121354-35964-533-0 from framework 20140428-225427-1740121354-35964-533-0000
> W0428 22:54:28.226312 565 master.cpp:2437] Status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from slave(4)@10.37.184.103:35964 (HOST): error, couldn't lookup task
> I0428 22:54:28.226378 563 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.226554 554 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.226663 554 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.226667 553 status_update_manager.cpp:398] Received status update acknowledgement (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.226872 552 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.226994 552 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.227147 556 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.227288 556 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.227378 556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.227519 548 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.227524 556 master.cpp:2835] Successfully authenticated slave(4)@10.37.184.103:35964
> I0428 22:54:28.227975 556 slave.cpp:642] Successfully authenticated with master master@10.37.184.103:35964
> W0428 22:54:28.228243 568 master.cpp:2244] Slave at slave(4)@10.37.184.103:35964 (HOST) is being allowed to re-register with an already in use id (20140428-225427-1740121354-35964-533-0)
> I0428 22:54:28.228421 556 slave.cpp:725] Re-registered with master master@10.37.184.103:35964
> I0428 22:54:28.228457 553 hierarchical_allocator_process.hpp:498] Slave 20140428-225427-1740121354-35964-533-0 reconnected
> ../../src/tests/fault_tolerance_tests.cpp:2084: Failure
> Value of: status.get().state()
> Actual: TASK_LOST
> Expected: TASK_FINISHED
> /var/tmp/sclly9xeu: line 8: 533 Segmentation fault './bin/mesos-tests.sh' '--gtest_filter=FaultToleranceTest.ReconcileIncompleteTasks' '--gtest_repeat=-1' '--gtest_break_on_failure' '--verbose'
> [bmahler@HOST build]$ Write failed: Broken pipe
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)