You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Dominic Hamon (JIRA)" <ji...@apache.org> on 2014/06/10 20:03:02 UTC

[jira] [Commented] (MESOS-513) FaultToleranceTest.SchedulerFailoverFrameworkMessage test is flaky

    [ https://issues.apache.org/jira/browse/MESOS-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026773#comment-14026773 ] 

Dominic Hamon commented on MESOS-513:
-------------------------------------

Ran for 2500 iterations and unable to reproduce.

> FaultToleranceTest.SchedulerFailoverFrameworkMessage test is flaky
> ------------------------------------------------------------------
>
>                 Key: MESOS-513
>                 URL: https://issues.apache.org/jira/browse/MESOS-513
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Thomas Marshall
>            Assignee: Dominic Hamon
>
> https://hadrian.millennium.berkeley.edu/jenkins/job/Mesos-minimal/370/console
> [ RUN      ] FaultToleranceTest.SchedulerFailoverFrameworkMessage
> I0617 13:43:31.122650 14700 master.cpp:228] Master started on 127.0.1.1:55889
> I0617 13:43:31.122716 14700 master.cpp:243] Master ID: 201306171343-16842879-55889-14678
> W0617 13:43:31.122899 14697 master.cpp:83] No whitelist given. Advertising offers for all slaves
> I0617 13:43:31.123078 14700 master.cpp:526] Elected as master!
> I0617 13:43:31.129281 14700 slave.cpp:219] Slave started on 72)@127.0.1.1:55889
> I0617 13:43:31.129338 14700 slave.cpp:220] Slave resources: cpus=2; mem=1024; ports=[31000-32000]; disk=1024
> I0617 13:43:31.129907 14699 master.cpp:569] Registering framework 201306171343-16842879-55889-14678-0000 at scheduler(61)@127.0.1.1:55889
> I0617 13:43:31.129971 14700 slave.cpp:540] New master detected at master@127.0.1.1:55889
> I0617 13:43:31.130146 14699 hierarchical_allocator_process.hpp:327] Added framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.130167 14700 slave.cpp:555] Postponing registration until recovery is complete
> I0617 13:43:31.130192 14698 status_update_manager.cpp:155] New master detected at master@127.0.1.1:55889
> I0617 13:43:31.130249 14700 slave.cpp:401] Finished recovery
> I0617 13:43:31.130493 14698 master.cpp:891] Attempting to register slave on ubuntu at slave(72)@127.0.1.1:55889
> I0617 13:43:31.130522 14698 master.cpp:1851] Adding slave 201306171343-16842879-55889-14678-0 at ubuntu with cpus=2; mem=1024; ports=[31000-32000]; disk=1024
> I0617 13:43:31.130635 14699 slave.cpp:600] Registered with master master@127.0.1.1:55889; given slave ID 201306171343-16842879-55889-14678-0
> I0617 13:43:31.130759 14697 hierarchical_allocator_process.hpp:449] Added slave 201306171343-16842879-55889-14678-0 (ubuntu) with cpus=2; mem=1024; ports=[31000-32000]; disk=1024 (and cpus=2; mem=1024; ports=[31000-32000]; disk=1024 available)
> I0617 13:43:31.131098 14697 master.cpp:1239] Sending 1 offers to framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.131892 14699 master.cpp:1472] Processing reply for offer 201306171343-16842879-55889-14678-0 on slave 201306171343-16842879-55889-14678-0 (ubuntu) for framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.132138 14699 master.hpp:291] Adding task 1 with resources cpus=2; mem=1024; ports=[31000-32000]; disk=1024 on slave 201306171343-16842879-55889-14678-0
> I0617 13:43:31.132213 14699 master.cpp:1591] Launching task 1 of framework 201306171343-16842879-55889-14678-0000 with resources cpus=2; mem=1024; ports=[31000-32000]; disk=1024 on slave 201306171343-16842879-55889-14678-0 (ubuntu)
> I0617 13:43:31.132488 14699 slave.cpp:740] Got assigned task 1 for framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.132761 14699 slave.cpp:838] Launching task 1 for framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.134079 14699 paths.hpp:303] Created executor directory '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_DVe9Uf/slaves/201306171343-16842879-55889-14678-0/frameworks/201306171343-16842879-55889-14678-0000/executors/default/runs/127bf532-ba74-46ef-8d5f-63636383e97e'
> I0617 13:43:31.134562 14699 slave.cpp:949] Queuing task '1' for executor default of framework '201306171343-16842879-55889-14678-0000
> I0617 13:43:31.134639 14699 slave.cpp:522] Successfully attached file '/tmp/FaultToleranceTest_SchedulerFailoverFrameworkMessage_DVe9Uf/slaves/201306171343-16842879-55889-14678-0/frameworks/201306171343-16842879-55889-14678-0000/executors/default/runs/127bf532-ba74-46ef-8d5f-63636383e97e'
> I0617 13:43:31.134835 14699 slave.cpp:1396] Got registration for executor 'default' of framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.135053 14699 slave.cpp:1511] Flushing queued task 1 for executor 'default' of framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.136834 14697 slave.cpp:1693] Handling status update TASK_RUNNING (UUID: 86548a6f-f7b9-4fcb-95cf-aa32b6a7e757) for task 1 of framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.137197 14699 status_update_manager.cpp:290] Received status update TASK_RUNNING (UUID: 86548a6f-f7b9-4fcb-95cf-aa32b6a7e757) for task 1 of framework 201306171343-16842879-55889-14678-0000 with checkpoint=false
> I0617 13:43:31.137267 14699 status_update_manager.cpp:450] Creating StatusUpdate stream for task 1 of framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.137382 14699 status_update_manager.cpp:336] Forwarding status update TASK_RUNNING (UUID: 86548a6f-f7b9-4fcb-95cf-aa32b6a7e757) for task 1 of framework 201306171343-16842879-55889-14678-0000 to master@127.0.1.1:55889
> I0617 13:43:31.137531 14697 master.cpp:1022] Status update from slave(72)@127.0.1.1:55889: task 1 of framework 201306171343-16842879-55889-14678-0000 is now in state TASK_RUNNING
> I0617 13:43:31.137565 14699 slave.cpp:1810] Sending acknowledgement for status update TASK_RUNNING (UUID: 86548a6f-f7b9-4fcb-95cf-aa32b6a7e757) for task 1 of framework 201306171343-16842879-55889-14678-0000 to executor(26)@127.0.1.1:55889
> I0617 13:43:31.138015 14700 status_update_manager.cpp:360] Received status update acknowledgement 86548a6f-f7b9-4fcb-95cf-aa32b6a7e757 for task 1 of framework 201306171343-16842879-55889-14678-0000
> I0617 13:43:31.138618 14698 master.cpp:604] Re-registering framework 201306171343-16842879-55889-14678-0000 at scheduler(62)@127.0.1.1:55889
> I0617 13:43:31.138685 14698 master.cpp:623] Framework 201306171343-16842879-55889-14678-0000 failed over
> I0617 13:43:31.139152 14697 slave.cpp:1863] Sending message for framework 201306171343-16842879-55889-14678-0000 to scheduler(61)@127.0.1.1:55889
> W0617 13:43:31.139173 14698 master.cpp:721] scheduler(61)@127.0.1.1:55889 tried to deactivate framework; expecting scheduler(62)@127.0.1.1:55889
> I0617 13:43:31.139255 14697 slave.cpp:1278] Updating framework 201306171343-16842879-55889-14678-0000 pid to scheduler(62)@127.0.1.1:55889
> W0617 13:43:36.124213 14699 master.cpp:83] No whitelist given. Advertising offers for all slaves
> ../../src/tests/fault_tolerance_tests.cpp:1140: Failure
> Failed to wait 5secs for frameworkMessage
> W0617 13:43:41.125799 14700 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:43:46.127622 14697 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:43:51.129123 14698 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:43:56.130734 14697 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:44:01.131566 14697 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:44:06.132598 14699 master.cpp:83] No whitelist given. Advertising offers for all slaves
> W0617 13:44:11.133581 14700 master.cpp:83] No whitelist given. Advertising offers for all slaves
> .....



--
This message was sent by Atlassian JIRA
(v6.2#6252)