You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joseph Wu (JIRA)" <ji...@apache.org> on 2016/02/09 00:13:39 UTC
[jira] [Commented] (MESOS-4604) ROOT_DOCKER_DockerHealthyTask is flaky.

    [ https://issues.apache.org/jira/browse/MESOS-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137967#comment-15137967 ] 

Joseph Wu commented on MESOS-4604:
----------------------------------

I'm convinced this is a [problem in docker|https://github.com/docker/docker/issues/12738].  The tests fail when [{{docker stop}}|https://github.com/apache/mesos/blob/7aafb8e44d347a03cbef83d3f7ee4705b9d23c09/src/slave/containerizer/docker.cpp#L1547] hangs indefinitely.  
Note: docker [doesn't support this Ubuntu15.04 anymore|https://github.com/docker/docker/pull/18809].

To get rid of the {{__cxa_pure_virtual}}, we'll need to do some refactoring (see [MESOS-2017|https://issues.apache.org/jira/browse/MESOS-2017]).  At the moment, most of our containerizer tests must call {{Shutdown()}} before they exit the scope of the test.  Otherwise, {{MesosTest}} will call {{Shutdown()}} and dereference some stack-allocated containerizers.

I propose: 
* Change {{MesosTest::StartSlave}} to take a {{Shared<Containerizer>}}.  Change all tests to dynamically allocate containerizers.
* We remove all manual {{Shutdown()}} calls if they occur at the end of the test.

> ROOT_DOCKER_DockerHealthyTask is flaky.
> ---------------------------------------
>
>                 Key: MESOS-4604
>                 URL: https://issues.apache.org/jira/browse/MESOS-4604
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>         Environment: CentOS 6/7, Ubuntu 15.04 on AWS.
>            Reporter: Jan Schlicht
>            Assignee: Joseph Wu
>              Labels: flaky-test, mesosphere, test
>
> Log from Teamcity that is running {{sudo ./bin/mesos-tests.sh}} on AWS EC2 instances:
> {noformat}
> [18:27:14][Step 8/8] [----------] 8 tests from HealthCheckTest
> [18:27:14][Step 8/8] [ RUN      ] HealthCheckTest.HealthyTask
> [18:27:17][Step 8/8] [       OK ] HealthCheckTest.HealthyTask (2222 ms)
> [18:27:17][Step 8/8] [ RUN      ] HealthCheckTest.ROOT_DOCKER_DockerHealthyTask
> [18:27:36][Step 8/8] ../../src/tests/health_check_tests.cpp:388: Failure
> [18:27:36][Step 8/8] Failed to wait 15secs for termination
> [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure virtual method called
> [18:27:36][Step 8/8]     @     0x7f7077055e1c  google::LogMessage::Fail()
> [18:27:36][Step 8/8]     @     0x7f707705ba6f  google::RawLog__()
> [18:27:36][Step 8/8]     @     0x7f70760f76c9  __cxa_pure_virtual
> [18:27:36][Step 8/8]     @           0xa9423c  mesos::internal::tests::Cluster::Slaves::shutdown()
> [18:27:36][Step 8/8]     @          0x1074e45  mesos::internal::tests::MesosTest::ShutdownSlaves()
> [18:27:36][Step 8/8]     @          0x1074de4  mesos::internal::tests::MesosTest::Shutdown()
> [18:27:36][Step 8/8]     @          0x1070ec7  mesos::internal::tests::MesosTest::TearDown()
> [18:27:36][Step 8/8]     @          0x16eb7b2  testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8]     @          0x16e61a9  testing::internal::HandleExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8]     @          0x16c56aa  testing::Test::Run()
> [18:27:36][Step 8/8]     @          0x16c5e89  testing::TestInfo::Run()
> [18:27:36][Step 8/8]     @          0x16c650a  testing::TestCase::Run()
> [18:27:36][Step 8/8]     @          0x16cd1f6  testing::internal::UnitTestImpl::RunAllTests()
> [18:27:36][Step 8/8]     @          0x16ec513  testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8]     @          0x16e6df1  testing::internal::HandleExceptionsInMethodIfSupported<>()
> [18:27:36][Step 8/8]     @          0x16cbe26  testing::UnitTest::Run()
> [18:27:36][Step 8/8]     @           0xe54c84  RUN_ALL_TESTS()
> [18:27:36][Step 8/8]     @           0xe54867  main
> [18:27:36][Step 8/8]     @     0x7f7071560a40  (unknown)
> [18:27:36][Step 8/8]     @           0x9b52d9  _start
> [18:27:36][Step 8/8] Aborted (core dumped)
> [18:27:36][Step 8/8] Process exited with code 134
> {noformat}
> Happens with Ubuntu 15.04, CentOS 6, CentOS 7 _quite_ often. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)