You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2018/10/09 19:07:00 UTC
[jira] [Assigned] (MESOS-9108) Test
`ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI`
is flaky.
[ https://issues.apache.org/jira/browse/MESOS-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Meng Zhu reassigned MESOS-9108:
-------------------------------
Assignee: (was: Meng Zhu)
> Test `ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI` is flaky.
> ---------------------------------------------------------------------------------------------
>
> Key: MESOS-9108
> URL: https://issues.apache.org/jira/browse/MESOS-9108
> Project: Mesos
> Issue Type: Bug
> Reporter: Meng Zhu
> Priority: Major
> Labels: flaky-test
> Attachments: DefaultExecutorTest_TaskWithFileURI_badrun.txt
>
>
> The test is flaky and segfault on CI ubuntu-16.04-SSL, log attached.
> Looks like this is due to a race condition during the test destruction sequence:
> The test
> {code:c++}
> Future<v1::scheduler::Event::Update> startingUpdate;
> Future<v1::scheduler::Event::Update> runningUpdate;
> Future<v1::scheduler::Event::Update> finishedUpdate;
> EXPECT_CALL(*scheduler, update(_, _))
> .WillOnce(
> DoAll(
> FutureArg<1>(&startingUpdate),
> v1::scheduler::SendAcknowledge(frameworkId, agentId)))
> .WillOnce(
> DoAll(
> FutureArg<1>(&runningUpdate),
> v1::scheduler::SendAcknowledge(frameworkId, agentId)))
> .WillOnce(
> DoAll(
> FutureArg<1>(&finishedUpdate),
> v1::scheduler::SendAcknowledge(frameworkId, agentId)));
> mesos.send(
> v1::createCallAccept(
> frameworkId,
> offer,
> {v1::LAUNCH_GROUP(
> executorInfo, v1::createTaskGroupInfo({taskInfo}))}));
> AWAIT_READY(startingUpdate);
> ASSERT_EQ(v1::TASK_STARTING, startingUpdate->status().state());
> ASSERT_EQ(taskInfo.task_id(), startingUpdate->status().task_id());
> AWAIT_READY(runningUpdate);
> ASSERT_EQ(v1::TASK_RUNNING, runningUpdate->status().state());
> ASSERT_EQ(taskInfo.task_id(), runningUpdate->status().task_id());
> AWAIT_READY(finishedUpdate);
> ASSERT_EQ(v1::TASK_FINISHED, finishedUpdate->status().state());
> ASSERT_EQ(taskInfo.task_id(), finishedUpdate->status().task_id());
> }
> {code}
> The sending acknowledgment of the last task status update (TASK_FINISHED) could race with the test tear down. Specifically, the `EXPECT_CALL` on the `update()` captures a pointer to arg0 which is `mesos`. However, `mesos` could be destructed during the test teardown leaving arg0 a nullptr and consequently the test segfaults when it tries to call arg0->...send().
> One quick fix is to remove the last acknowledgment. However, a sound fix is to make the `mesos` pointer a shared one. This would entail a lot of interface changes. Since our only concern is the capture in the `EXPECT_CALL`, maybe we only need to change the interface of that to take a shared pointer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)