You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gilbert Song (JIRA)" <ji...@apache.org> on 2018/01/13 01:26:00 UTC

[jira] [Commented] (MESOS-7506) Multiple tests leave orphan containers.

    [ https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324861#comment-16324861 ] 

Gilbert Song commented on MESOS-7506:
-------------------------------------

{noformat}
commit 364e78af7598289f6724c2ee037e0cb1de377902
Author: Gilbert Song <so...@gmail.com>
Date:   Thu Jan 11 18:05:54 2018 -0800

    Fixed the default executor flaky testes in tests/cluster.cpp.
    
    This patch fixes some flaky tests listed below:
    1. DefaultExecutorTest.KillTask/0
    2. DefaultExecutorTest.TaskWithFileURI/0
    3. DefaultExecutorTest.ResourceLimitation/0
    4. DefaultExecutorTest.KillMultipleTasks/0
    
    The root cause is that either docker containerizer or mesos
    containerizer have wait() and destroy() rely on the same
    future `ContainerTermination` which means these two methods
    become ready simultaneously, but this is not true for the
    composing containerizer because wait() may finish before
    destroy in which case the `containers_` hasshmap is not
    cleaned up yet in destroy()'s `.onAny` callback.
    
    Review: https://reviews.apache.org/r/65141
{noformat}

> Multiple tests leave orphan containers.
> ---------------------------------------
>
>                 Key: MESOS-7506
>                 URL: https://issues.apache.org/jira/browse/MESOS-7506
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>         Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>            Reporter: Alexander Rukletsov
>            Assignee: Andrei Budnik
>              Labels: containerizer, flaky-test, mesosphere
>             Fix For: 1.6.0
>
>         Attachments: KillMultipleTasks-badrun.txt, ROOT_IsolatorFlags-badrun.txt, ROOT_IsolatorFlags-badrun2.txt, ReconcileTasksMissingFromSlave-badrun.txt, ResourceLimitation-badrun.txt, ResourceLimitation-badrun2.txt, RestartSlaveRequireExecutorAuthentication-badrun.txt, TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> SlaveTest.RestartSlaveRequireExecutorAuthentication // cannot reproduce any more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)