You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Adam B (JIRA)" <ji...@apache.org> on 2014/05/02 20:21:16 UTC

[jira] [Commented] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

    [ https://issues.apache.org/jira/browse/MESOS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988042#comment-13988042 ] 

Adam B commented on MESOS-1264:
-------------------------------

Fix up for review: https://reviews.apache.org/r/21017/

> Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.
> ------------------------------------------------------------------------------------
>
>                 Key: MESOS-1264
>                 URL: https://issues.apache.org/jira/browse/MESOS-1264
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Benjamin Mahler
>            Assignee: Adam B
>             Fix For: 0.19.0
>
>
> Looks like there is a regression with slave authentication that is making the FaultToleranceTest.ReconcileIncompleteTasks flaky.
> The following is what appears to be happening in the test and seems to be a regression:
> 1. Slave re-detects leading Master.
> 2. Slave re-authenticates with Master.
> 3. Master sees slave as already activated, calls disconnect().
> 4. For non-checkpointing frameworks, this call to disconnect() assumes the slave has exited, and will send TASK_LOST.
> 5. In the case where the slave did not exit, the tasks are not lost.
> Either we can consider this case to be valid as TASK_LOST, similarly to when the slave connection closes in Master::exited(UPID), updating the test as necessary.
> Or we can try to avoid sending TASK_LOST for authentication retries.
> {noformat}
> [ RUN      ] FaultToleranceTest.ReconcileIncompleteTasks
> Using temporary directory '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P'
> I0428 22:54:27.878968   533 leveldb.cpp:174] Opened db in 152.905109ms
> I0428 22:54:27.901624   533 leveldb.cpp:181] Compacted db in 22.554687ms
> I0428 22:54:27.901720   533 leveldb.cpp:196] Created db iterator in 5625ns
> I0428 22:54:27.901761   533 leveldb.cpp:202] Seeked to beginning of db in 930ns
> I0428 22:54:27.901922   533 leveldb.cpp:271] Iterated through 0 keys in the db in 363ns
> I0428 22:54:27.901958   533 replica.cpp:729] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0428 22:54:27.902427   559 recover.cpp:425] Starting replica recovery
> I0428 22:54:27.902935   561 recover.cpp:451] Replica is in EMPTY status
> I0428 22:54:27.904425   556 master.cpp:266] Master 20140428-225427-1740121354-35964-533 (HOSTNAME) started on 10.37.184.103:35964
> I0428 22:54:27.904470   556 master.cpp:303] Master only allowing authenticated frameworks to register
> I0428 22:54:27.904491   556 master.cpp:308] Master only allowing authenticated slaves to register
> I0428 22:54:27.904515   556 credentials.hpp:35] Loading credentials for authentication
> I0428 22:54:27.904561   551 replica.cpp:626] Replica in EMPTY status received a broadcasted recover request
> W0428 22:54:27.904619   556 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_NvDg7P/credentials': No such file or directory
> I0428 22:54:27.904887   550 recover.cpp:188] Received a recover response from a replica in EMPTY status
> I0428 22:54:27.905383   568 recover.cpp:542] Updating replica status to STARTING
> I0428 22:54:27.906673   558 master.cpp:922] The newly elected leader is master@10.37.184.103:35964 with id 20140428-225427-1740121354-35964-533
> I0428 22:54:27.906708   558 master.cpp:932] Elected as the leading master!
> I0428 22:54:27.906728   558 master.cpp:753] Recovering from registrar
> I0428 22:54:27.906842   556 registrar.cpp:275] Recovering registrar
> I0428 22:54:27.937743   561 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 31.905754ms
> I0428 22:54:27.937796   561 replica.cpp:320] Persisted replica status to STARTING
> I0428 22:54:27.938019   561 recover.cpp:451] Replica is in STARTING status
> I0428 22:54:27.939944   569 replica.cpp:626] Replica in STARTING status received a broadcasted recover request
> I0428 22:54:27.940160   565 recover.cpp:188] Received a recover response from a replica in STARTING status
> I0428 22:54:27.941345   558 recover.cpp:542] Updating replica status to VOTING
> I0428 22:54:27.962074   553 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 20.298725ms
> I0428 22:54:27.962128   553 replica.cpp:320] Persisted replica status to VOTING
> I0428 22:54:27.962266   553 recover.cpp:556] Successfully joined the Paxos group
> I0428 22:54:27.962432   553 recover.cpp:440] Recover process terminated
> I0428 22:54:27.962946   566 log.cpp:656] Attempting to start the writer
> I0428 22:54:27.964815   547 replica.cpp:474] Replica received implicit promise request with proposal 1
> I0428 22:54:27.970388   547 leveldb.cpp:304] Persisting metadata (8 bytes) to leveldb took 5.516929ms
> I0428 22:54:27.970440   547 replica.cpp:342] Persisted promised to 1
> I0428 22:54:27.971122   569 coordinator.cpp:229] Coordinator attemping to fill missing position
> I0428 22:54:27.972815   570 replica.cpp:375] Replica received explicit promise request for position 0 with proposal 2
> I0428 22:54:27.978729   570 leveldb.cpp:341] Persisting action (8 bytes) to leveldb took 5.798681ms
> I0428 22:54:27.978806   570 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.980425   547 replica.cpp:508] Replica received write request for position 0
> I0428 22:54:27.980496   547 leveldb.cpp:436] Reading position from leveldb took 21889ns
> I0428 22:54:27.987072   547 leveldb.cpp:341] Persisting action (14 bytes) to leveldb took 6.529569ms
> I0428 22:54:27.987125   547 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.987746   559 replica.cpp:643] Replica received learned notice for position 0
> I0428 22:54:27.995412   559 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 7.515296ms
> I0428 22:54:27.995465   559 replica.cpp:664] Persisted action at 0
> I0428 22:54:27.995499   559 replica.cpp:649] Replica learned NOP action at position 0
> I0428 22:54:27.996111   560 log.cpp:672] Writer started with ending position 0
> I0428 22:54:27.997576   562 leveldb.cpp:436] Reading position from leveldb took 23701ns
> I0428 22:54:28.000907   550 registrar.cpp:308] Successfully recovered registrar
> I0428 22:54:28.001000   550 registrar.cpp:379] Attempting to update the 'registry'
> I0428 22:54:28.003427   567 log.cpp:680] Attempting to append 153 bytes to the log
> I0428 22:54:28.003566   548 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 1
> I0428 22:54:28.004590   552 replica.cpp:508] Replica received write request for position 1
> I0428 22:54:28.079499   552 leveldb.cpp:341] Persisting action (172 bytes) to leveldb took 74.874271ms
> I0428 22:54:28.079545   552 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.080243   548 replica.cpp:643] Replica received learned notice for position 1
> I0428 22:54:28.103667   548 leveldb.cpp:341] Persisting action (174 bytes) to leveldb took 23.258596ms
> I0428 22:54:28.103731   548 replica.cpp:664] Persisted action at 1
> I0428 22:54:28.103847   548 replica.cpp:649] Replica learned APPEND action at position 1
> I0428 22:54:28.105062   568 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.105399   561 log.cpp:699] Attempting to truncate the log to 1
> I0428 22:54:28.105626   559 master.cpp:780] Recovered 0 slaves from the Registry (115B) ; allowing 10mins for slaves to re-register
> I0428 22:54:28.105702   547 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 2
> I0428 22:54:28.107192   548 replica.cpp:508] Replica received write request for position 2
> I0428 22:54:28.110117   547 slave.cpp:140] Slave started on 4)@10.37.184.103:35964
> I0428 22:54:28.110178   547 slave.cpp:149] Moving slave process into its own cgroup
> I0428 22:54:28.111990   548 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 4.755207ms
> I0428 22:54:28.112031   548 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.112576   558 replica.cpp:643] Replica received learned notice for position 2
> I0428 22:54:28.113647   533 sched.cpp:121] Version: 0.19.0
> I0428 22:54:28.114310   549 sched.cpp:217] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.114351   549 sched.cpp:268] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.114554   561 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.114681   549 master.cpp:2795] Authenticating scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.114882   570 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.115015   570 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.115047   570 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.115114   570 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.115195   570 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.115279   548 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.115485   556 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.115576   556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.115681   555 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.115736   549 master.cpp:2835] Successfully authenticated scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116276   566 sched.cpp:342] Successfully authenticated with master master@10.37.184.103:35964
> I0428 22:54:28.116435   561 master.cpp:981] Received registration request from scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116557   561 master.cpp:999] Registering framework 20140428-225427-1740121354-35964-533-0000 at scheduler(4)@10.37.184.103:35964
> I0428 22:54:28.116726   552 sched.cpp:392] Framework registered with 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.116801   561 hierarchical_allocator_process.hpp:332] Added framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.120318   558 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.6976ms
> I0428 22:54:28.120383   558 leveldb.cpp:399] Deleting ~1 keys from leveldb took 18353ns
> I0428 22:54:28.120409   558 replica.cpp:664] Persisted action at 2
> I0428 22:54:28.120434   558 replica.cpp:649] Replica learned TRUNCATE action at position 2
> I0428 22:54:28.135422   547 slave.cpp:149] Moving slave process into its own cgroup
> I0428 22:54:28.155840   547 credentials.hpp:35] Loading credentials for authentication
> W0428 22:54:28.155907   547 credentials.hpp:48] Failed to stat credentials file 'file:///tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/credential': No such file or directory
> I0428 22:54:28.155962   547 slave.cpp:231] Slave using credential for: test-principal
> I0428 22:54:28.156137   547 slave.cpp:244] Slave resources: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.156743   547 slave.cpp:272] Slave hostname: HOSTNAME
> I0428 22:54:28.156797   547 slave.cpp:273] Slave checkpoint: false
> I0428 22:54:28.157765   567 state.cpp:33] Recovering state from '/tmp/FaultToleranceTest_ReconcileIncompleteTasks_aPd9kr/meta'
> I0428 22:54:28.158015   564 status_update_manager.cpp:193] Recovering status update manager
> I0428 22:54:28.158342   548 slave.cpp:2943] Finished recovery
> I0428 22:54:28.158860   566 slave.cpp:525] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.159013   566 slave.cpp:585] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.159083   548 status_update_manager.cpp:167] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.159139   566 slave.cpp:558] Detecting new master
> I0428 22:54:28.159340   567 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.159664   567 master.cpp:2795] Authenticating slave(4)@10.37.184.103:35964
> I0428 22:54:28.159989   559 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.160220   548 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.160253   548 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.160328   548 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.160426   548 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.160490   548 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.160568   548 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.160673   548 authenticator.hpp:334] Authentication success
> I0428 22:54:28.160758   558 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.160787   548 master.cpp:2835] Successfully authenticated slave(4)@10.37.184.103:35964
> I0428 22:54:28.161053   570 slave.cpp:642] Successfully authenticated with master master@10.37.184.103:35964
> I0428 22:54:28.161638   556 registrar.cpp:379] Attempting to update the 'registry'
> I0428 22:54:28.164188   570 log.cpp:680] Attempting to append 378 bytes to the log
> I0428 22:54:28.164309   558 coordinator.cpp:339] Coordinator attempting to write APPEND action at position 3
> I0428 22:54:28.165631   569 replica.cpp:508] Replica received write request for position 3
> I0428 22:54:28.186933   569 leveldb.cpp:341] Persisting action (397 bytes) to leveldb took 21.263363ms
> I0428 22:54:28.186987   569 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.187626   555 replica.cpp:643] Replica received learned notice for position 3
> I0428 22:54:28.195307   555 leveldb.cpp:341] Persisting action (399 bytes) to leveldb took 7.628475ms
> I0428 22:54:28.195361   555 replica.cpp:664] Persisted action at 3
> I0428 22:54:28.195397   555 replica.cpp:649] Replica learned APPEND action at position 3
> I0428 22:54:28.196445   547 registrar.cpp:427] Successfully updated 'registry'
> I0428 22:54:28.196738   553 log.cpp:699] Attempting to truncate the log to 3
> I0428 22:54:28.197031   552 master.cpp:2169] Admitted slave on HOSTNAME at slave(4)@IP:35964
> I0428 22:54:28.197108   552 master.cpp:3283] Adding slave 20140428-225427-1740121354-35964-533-0 at HOST with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]
> I0428 22:54:28.196979   568 coordinator.cpp:339] Coordinator attempting to write TRUNCATE action at position 4
> I0428 22:54:28.197506   560 slave.cpp:675] Registered with master master@10.37.184.103:35964; given slave ID 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.197764   547 hierarchical_allocator_process.hpp:445] Added slave 20140428-225427-1740121354-35964-533-0 (HOST) with cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (and cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] available)
> I0428 22:54:28.198402   565 master.cpp:2744] Sending 1 offers to framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.199038   568 replica.cpp:508] Replica received write request for position 4
> I0428 22:54:28.200683   548 master.cpp:1806] Processing reply for offers: [ 20140428-225427-1740121354-35964-533-0 ] on slave 20140428-225427-1740121354-35964-533-0 (HOST) for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201052   548 master.hpp:558] Adding task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201151   548 master.cpp:2919] Launching task 1 of framework 20140428-225427-1740121354-35964-533-0000 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.201370   555 slave.cpp:905] Got assigned task 1 for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.201853   555 slave.cpp:1015] Launching task 1 for framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.203640   568 leveldb.cpp:341] Persisting action (16 bytes) to leveldb took 4.423216ms
> I0428 22:54:28.203698   568 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.204627   547 replica.cpp:643] Replica received learned notice for position 4
> I0428 22:54:28.206099   555 exec.cpp:131] Version: 0.19.0
> I0428 22:54:28.206481   555 slave.cpp:1125] Queuing task '1' for executor default of framework '20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.206670   555 slave.cpp:2282] Monitoring executor 'default' of framework '20140428-225427-1740121354-35964-533-0000' in container '77e0cf10-1c8d-47a7-946d-eb56f26e339d'
> I0428 22:54:28.206925   555 slave.cpp:1598] Got registration for executor 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207355   555 slave.cpp:1717] Flushing queued task 1 for executor 'default' of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.207478   557 exec.cpp:205] Executor registered on slave 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.211570   557 slave.cpp:1953] Handling status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from executor(4)@10.37.184.103:35964
> I0428 22:54:28.211958   547 leveldb.cpp:341] Persisting action (18 bytes) to leveldb took 7.29439ms
> I0428 22:54:28.211987   559 status_update_manager.cpp:320] Received status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.212195   547 leveldb.cpp:399] Deleting ~2 keys from leveldb took 56641ns
> I0428 22:54:28.212311   547 replica.cpp:664] Persisted action at 4
> I0428 22:54:28.212358   547 replica.cpp:649] Replica learned TRUNCATE action at position 4
> I0428 22:54:28.212357   559 status_update_manager.cpp:373] Forwarding status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to master@10.37.184.103:35964
> I0428 22:54:28.212858   559 slave.cpp:2076] Sending acknowledgement for status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to executor(4)@10.37.184.103:35964
> I0428 22:54:28.222946   565 slave.cpp:525] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.223008   565 slave.cpp:585] Authenticating with master master@10.37.184.103:35964
> I0428 22:54:28.223079   564 status_update_manager.cpp:167] New master detected at master@10.37.184.103:35964
> I0428 22:54:28.223125   565 slave.cpp:558] Detecting new master
> W0428 22:54:28.223120   564 status_update_manager.cpp:181] Resending status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.223142   568 authenticatee.hpp:128] Creating new client SASL connection
> I0428 22:54:28.223157   564 status_update_manager.cpp:373] Forwarding status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 to master@10.37.184.103:35964
> I0428 22:54:28.223357   565 master.cpp:1263] Disconnecting slave 20140428-225427-1740121354-35964-533-0
> I0428 22:54:28.223460   566 hierarchical_allocator_process.hpp:484] Slave 20140428-225427-1740121354-35964-533-0 disconnected
> I0428 22:54:28.223486   565 master.cpp:1283] Removing non-checkpointing framework 20140428-225427-1740121354-35964-533-0000 from disconnected slave 20140428-225427-1740121354-35964-533-0(HOST)
> I0428 22:54:28.223505   565 master.cpp:3235] Removing framework 20140428-225427-1740121354-35964-533-0000 from slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225428   565 master.cpp:2444] Status update TASK_LOST (UUID: dfa02d31-0a93-4808-924c-0969412c7e52) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from @0.0.0.0:0
> I0428 22:54:28.225581   565 master.hpp:576] Removing task 1 with resources cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] on slave 20140428-225427-1740121354-35964-533-0 (HOST)
> I0428 22:54:28.225934   565 master.cpp:2795] Authenticating slave(4)@10.37.184.103:35964
> I0428 22:54:28.225975   564 hierarchical_allocator_process.hpp:637] Recovered cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000] (total allocatable: cpus(*):2; mem(*):1024; disk(*):1024; ports(*):[31000-32000]) on slave 20140428-225427-1740121354-35964-533-0 from framework 20140428-225427-1740121354-35964-533-0000
> W0428 22:54:28.226312   565 master.cpp:2437] Status update TASK_FINISHED (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000 from slave(4)@10.37.184.103:35964 (HOST): error, couldn't lookup task
> I0428 22:54:28.226378   563 authenticator.hpp:148] Creating new server SASL connection
> I0428 22:54:28.226554   554 authenticatee.hpp:219] Received SASL authentication mechanisms: CRAM-MD5
> I0428 22:54:28.226663   554 authenticatee.hpp:245] Attempting to authenticate with mechanism 'CRAM-MD5'
> I0428 22:54:28.226667   553 status_update_manager.cpp:398] Received status update acknowledgement (UUID: e227720d-b0fb-4f5f-a40f-be27f9a3a219) for task 1 of framework 20140428-225427-1740121354-35964-533-0000
> I0428 22:54:28.226872   552 authenticator.hpp:254] Received SASL authentication start
> I0428 22:54:28.226994   552 authenticator.hpp:342] Authentication requires more steps
> I0428 22:54:28.227147   556 authenticatee.hpp:265] Received SASL authentication step
> I0428 22:54:28.227288   556 authenticator.hpp:282] Received SASL authentication step
> I0428 22:54:28.227378   556 authenticator.hpp:334] Authentication success
> I0428 22:54:28.227519   548 authenticatee.hpp:305] Authentication success
> I0428 22:54:28.227524   556 master.cpp:2835] Successfully authenticated slave(4)@10.37.184.103:35964
> I0428 22:54:28.227975   556 slave.cpp:642] Successfully authenticated with master master@10.37.184.103:35964
> W0428 22:54:28.228243   568 master.cpp:2244] Slave at slave(4)@10.37.184.103:35964 (HOST) is being allowed to re-register with an already in use id (20140428-225427-1740121354-35964-533-0)
> I0428 22:54:28.228421   556 slave.cpp:725] Re-registered with master master@10.37.184.103:35964
> I0428 22:54:28.228457   553 hierarchical_allocator_process.hpp:498] Slave 20140428-225427-1740121354-35964-533-0 reconnected
> ../../src/tests/fault_tolerance_tests.cpp:2084: Failure
> Value of: status.get().state()
>   Actual: TASK_LOST
> Expected: TASK_FINISHED
> /var/tmp/sclly9xeu: line 8:   533 Segmentation fault      './bin/mesos-tests.sh' '--gtest_filter=FaultToleranceTest.ReconcileIncompleteTasks' '--gtest_repeat=-1' '--gtest_break_on_failure' '--verbose'
> [bmahler@HOST build]$ Write failed: Broken pipe
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)