You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2018/07/24 05:42:00 UTC

[jira] [Created] (MESOS-9108) Test `ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI` is flaky.

Meng Zhu created MESOS-9108:
-------------------------------

             Summary: Test `ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI` is flaky.
                 Key: MESOS-9108
                 URL: https://issues.apache.org/jira/browse/MESOS-9108
             Project: Mesos
          Issue Type: Bug
            Reporter: Meng Zhu
            Assignee: Meng Zhu
         Attachments: DefaultExecutorTest_TaskWithFileURI_badrun.txt

The test is flaky and segfault on CI ubuntu-16.04-SSL, log attached.

Looks like this is due to a race condition during the test destruction sequence:
The test 

{noformat}
  Future<v1::scheduler::Event::Update> startingUpdate;
  Future<v1::scheduler::Event::Update> runningUpdate;
  Future<v1::scheduler::Event::Update> finishedUpdate;
  EXPECT_CALL(*scheduler, update(_, _))
    .WillOnce(
        DoAll(
            FutureArg<1>(&startingUpdate),
            v1::scheduler::SendAcknowledge(frameworkId, agentId)))
    .WillOnce(
        DoAll(
            FutureArg<1>(&runningUpdate),
            v1::scheduler::SendAcknowledge(frameworkId, agentId)))
    .WillOnce(
        DoAll(
            FutureArg<1>(&finishedUpdate),
            v1::scheduler::SendAcknowledge(frameworkId, agentId)));

  mesos.send(
      v1::createCallAccept(
          frameworkId,
          offer,
          {v1::LAUNCH_GROUP(
              executorInfo, v1::createTaskGroupInfo({taskInfo}))}));

  AWAIT_READY(startingUpdate);
  ASSERT_EQ(v1::TASK_STARTING, startingUpdate->status().state());
  ASSERT_EQ(taskInfo.task_id(), startingUpdate->status().task_id());

  AWAIT_READY(runningUpdate);
  ASSERT_EQ(v1::TASK_RUNNING, runningUpdate->status().state());
  ASSERT_EQ(taskInfo.task_id(), runningUpdate->status().task_id());

  AWAIT_READY(finishedUpdate);
  ASSERT_EQ(v1::TASK_FINISHED, finishedUpdate->status().state());
  ASSERT_EQ(taskInfo.task_id(), finishedUpdate->status().task_id());
}
{noformat}

The sending acknowledgment of the last task status update (TASK_FINISHED) could race with the scheduler destruction. Removing the last ack should fix the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)