You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2018/07/24 05:42:00 UTC
[jira] [Created] (MESOS-9108) Test
`ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI`
is flaky.
Meng Zhu created MESOS-9108:
-------------------------------
Summary: Test `ROOT_DOCKER_DockerAndMesosContainerizers/DefaultExecutorTest.TaskWithFileURI` is flaky.
Key: MESOS-9108
URL: https://issues.apache.org/jira/browse/MESOS-9108
Project: Mesos
Issue Type: Bug
Reporter: Meng Zhu
Assignee: Meng Zhu
Attachments: DefaultExecutorTest_TaskWithFileURI_badrun.txt
The test is flaky and segfault on CI ubuntu-16.04-SSL, log attached.
Looks like this is due to a race condition during the test destruction sequence:
The test
{noformat}
Future<v1::scheduler::Event::Update> startingUpdate;
Future<v1::scheduler::Event::Update> runningUpdate;
Future<v1::scheduler::Event::Update> finishedUpdate;
EXPECT_CALL(*scheduler, update(_, _))
.WillOnce(
DoAll(
FutureArg<1>(&startingUpdate),
v1::scheduler::SendAcknowledge(frameworkId, agentId)))
.WillOnce(
DoAll(
FutureArg<1>(&runningUpdate),
v1::scheduler::SendAcknowledge(frameworkId, agentId)))
.WillOnce(
DoAll(
FutureArg<1>(&finishedUpdate),
v1::scheduler::SendAcknowledge(frameworkId, agentId)));
mesos.send(
v1::createCallAccept(
frameworkId,
offer,
{v1::LAUNCH_GROUP(
executorInfo, v1::createTaskGroupInfo({taskInfo}))}));
AWAIT_READY(startingUpdate);
ASSERT_EQ(v1::TASK_STARTING, startingUpdate->status().state());
ASSERT_EQ(taskInfo.task_id(), startingUpdate->status().task_id());
AWAIT_READY(runningUpdate);
ASSERT_EQ(v1::TASK_RUNNING, runningUpdate->status().state());
ASSERT_EQ(taskInfo.task_id(), runningUpdate->status().task_id());
AWAIT_READY(finishedUpdate);
ASSERT_EQ(v1::TASK_FINISHED, finishedUpdate->status().state());
ASSERT_EQ(taskInfo.task_id(), finishedUpdate->status().task_id());
}
{noformat}
The sending acknowledgment of the last task status update (TASK_FINISHED) could race with the scheduler destruction. Removing the last ack should fix the test.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)