You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2019/02/14 01:34:00 UTC

[jira] [Commented] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky

    [ https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767776#comment-16767776 ] 

Meng Zhu commented on MESOS-5048:
---------------------------------

commit 1875b5380de926ce1759715227883418e2fb9717
Author: Meng Zhu <mz...@mesosphere.io>
Date:   Wed Jan 2 20:51:25 2019 -0800

    Fixed test `MesosContainerizerSlaveRecoveryTest.ResourceStatistics`.

    `MesosContainerizerSlaveRecoveryTest.ResourceStatistics` is flaky
    due to a race between executor shutdown (due to never getting any
    tasks) and the test querying resource statistics. If the executor
    is shutdown before the statistics query, the test will fail.

    This patch fixes the test by explicitly waiting for the task to
    be delivered and task status transition to `TASK_RUNNING` before
    restarting the agent. This way, the executor will not be shutdown
    after agent restart. Hence there will be no race.

    Review: https://reviews.apache.org/r/69656

> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---------------------------------------------------------------
>
>                 Key: MESOS-5048
>                 URL: https://issues.apache.org/jira/browse/MESOS-5048
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.28.0
>         Environment: Ubuntu 15.04, Ubuntu 16.04
>            Reporter: Jian Qiu
>            Assignee: Meng Zhu
>            Priority: Major
>              Labels: flaky-test
>             Fix For: 1.8.0
>
>         Attachments: ResourceStatistics-badrun2.txt, ResourceStatistics-badrun3.txt, ResourceStatistics-badrun4.txt
>
>
> ./mesos-tests.sh --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>    \--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)