You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Meng Zhu (JIRA)" <ji...@apache.org> on 2019/01/03 05:07:00 UTC

[jira] [Commented] (MESOS-5048) MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky

    [ https://issues.apache.org/jira/browse/MESOS-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732670#comment-16732670 ] 

Meng Zhu commented on MESOS-5048:
---------------------------------

Observed being flaky on our CI, re-opening the ticket. Uploaded `ResourceStatistics-badrun4.txt`.
Looks like a different race from last time.

There is a race between executor shutdown (due to never getting any tasks) and the test querying resource statistics. If the executor
is shutdown before the statistics query, the test will fail.

{noformat}
W0102 12:41:15.709300 27356 slave.cpp:5182] Shutting down reregistering executor '853030e8-dbf8-4493-8f02-557e061ad79a' of framework a4181b2d-03c1-4e32-ad0d-b2dd91245ca1-0000 at executor(1)@172.16.10.156:38324 because it has no tasks to run and has never been sent a task
{noformat}


> MesosContainerizerSlaveRecoveryTest.ResourceStatistics is flaky
> ---------------------------------------------------------------
>
>                 Key: MESOS-5048
>                 URL: https://issues.apache.org/jira/browse/MESOS-5048
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.28.0
>         Environment: Ubuntu 15.04, Ubuntu 16.04
>            Reporter: Jian Qiu
>            Assignee: Meng Zhu
>            Priority: Major
>              Labels: flaky-test
>             Fix For: 1.8.0
>
>         Attachments: ResourceStatistics-badrun2.txt, ResourceStatistics-badrun3.txt, ResourceStatistics-badrun4.txt
>
>
> ./mesos-tests.sh --gtest_filter=MesosContainerizerSlaveRecoveryTest.ResourceStatistics --gtest_repeat=100 --gtest_break_on_failure
> This is found in rb, and reproduced in my local machine. There are two types of failures. However, the failure does not appear when enabling verbose...
> {code}
> ../../src/tests/environment.cpp:790: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 1446 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-tests 
>  \-+- 9171 sh -c /mesos/mesos-0.29.0/_build/src/mesos-executor 
>    \--- 9185 /mesos/mesos-0.29.0/_build/src/.libs/lt-mesos-executor 
> {code}
> And
> {code}
> I0328 15:42:36.982471  5687 exec.cpp:150] Version: 0.29.0
> I0328 15:42:37.008765  5708 exec.cpp:225] Executor registered on slave 731fb93b-26fe-4c7c-a543-fc76f106a62e-S0
> Registered executor on mesos
> ../../src/tests/slave_recovery_tests.cpp:3506: Failure
> Value of: containers.get().size()
>   Actual: 0
> Expected: 1u
> Which is: 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)