You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gilbert Song (JIRA)" <ji...@apache.org> on 2018/01/13 01:26:00 UTC
[jira] [Commented] (MESOS-7506) Multiple tests leave orphan
containers.
[ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324861#comment-16324861 ]
Gilbert Song commented on MESOS-7506:
-------------------------------------
{noformat}
commit 364e78af7598289f6724c2ee037e0cb1de377902
Author: Gilbert Song <so...@gmail.com>
Date: Thu Jan 11 18:05:54 2018 -0800
Fixed the default executor flaky testes in tests/cluster.cpp.
This patch fixes some flaky tests listed below:
1. DefaultExecutorTest.KillTask/0
2. DefaultExecutorTest.TaskWithFileURI/0
3. DefaultExecutorTest.ResourceLimitation/0
4. DefaultExecutorTest.KillMultipleTasks/0
The root cause is that either docker containerizer or mesos
containerizer have wait() and destroy() rely on the same
future `ContainerTermination` which means these two methods
become ready simultaneously, but this is not true for the
composing containerizer because wait() may finish before
destroy in which case the `containers_` hasshmap is not
cleaned up yet in destroy()'s `.onAny` callback.
Review: https://reviews.apache.org/r/65141
{noformat}
> Multiple tests leave orphan containers.
> ---------------------------------------
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
> Issue Type: Bug
> Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
> Reporter: Alexander Rukletsov
> Assignee: Andrei Budnik
> Labels: containerizer, flaky-test, mesosphere
> Fix For: 1.6.0
>
> Attachments: KillMultipleTasks-badrun.txt, ROOT_IsolatorFlags-badrun.txt, ROOT_IsolatorFlags-badrun2.txt, ReconcileTasksMissingFromSlave-badrun.txt, ResourceLimitation-badrun.txt, ResourceLimitation-badrun2.txt, RestartSlaveRequireExecutorAuthentication-badrun.txt, TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
> Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> SlaveTest.RestartSlaveRequireExecutorAuthentication // cannot reproduce any more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)