You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Chun-Hung Hsiao (JIRA)" <ji...@apache.org> on 2019/04/10 03:35:00 UTC

[jira] [Created] (MESOS-9719) Test `AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.

Chun-Hung Hsiao created MESOS-9719:
--------------------------------------

             Summary: Test `AgentFailoverHTTPExecutorUsingResourceProviderResources` is flaky.
                 Key: MESOS-9719
                 URL: https://issues.apache.org/jira/browse/MESOS-9719
             Project: Mesos
          Issue Type: Bug
            Reporter: Chun-Hung Hsiao
            Assignee: Chun-Hung Hsiao


The test is flaky because:
 # It assumes the mock RP never reregisters, which might not be true.
 # It does not wait for the task and executor to be reaped, which would lead to a race between containerizer destroy and test teardown and cause cgroups cleanup to fail.
 # It fast-forwards the clock, which might lead to containerizer destroy failures.
 # It assumes that the framework only receives two status updates, which might not be true.

Example failure log:
{noformat}
E0410 00:18:23.526867  1251 slave.cpp:3118] Failed to update resources for container f941cb68-9f13-418c-be1b-702e5927b1eb of executor 'default' of framework ca96f624-9590-4776-9e83-39714cebd25f-0000, destroying container: Collect failed: Failed to publish resources 'disk(allocated: foo)[RAW]:200' for container f941cb68-9f13-418c-be1b-702e5927b1eb: Resource provider 616834b9-4dbb-45a7-b762-831ce5e8534a is not subscribed
I0410 00:18:23.526957  1251 containerizer.cpp:2576] Destroying container f941cb68-9f13-418c-be1b-702e5927b1eb in RUNNING state
I0410 00:18:23.526979  1251 containerizer.cpp:3278] Transitioning the state of container f941cb68-9f13-418c-be1b-702e5927b1eb from RUNNING to DESTROYING
I0410 00:18:23.526989  1251 containerizer.cpp:2576] Destroying container f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e in RUNNING state
I0410 00:18:23.526996  1251 containerizer.cpp:3278] Transitioning the state of container f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e from RUNNING to DESTROYING
I0410 00:18:23.527102  1251 linux_launcher.cpp:576] Asked to destroy container f941cb68-9f13-418c-be1b-702e5927b1eb.523acde5-8c21-4f3f-af71-7cb84b54803e
...
E0410 00:18:23.535424  1246 slave.cpp:6572] Termination of executor 'default' of framework ca96f624-9590-4776-9e83-39714cebd25f-0000 failed: Failed to destroy nested containers: Failed to kill all processes in the container: Timed out after 1mins
...
I0410 00:18:23.535817  1252 master.cpp:8983] Executor 'default' of framework ca96f624-9590-4776-9e83-39714cebd25f-0000 on agent ca96f624-9590-4776-9e83-39714cebd25f-S0 at slave(699)@172.16.10.211:33823 (ip-172-16-10-211.ec2.internal): wait status -1
...
../../src/tests/mesos.cpp:926: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/memory/mesos_test_b1965800-016c-494b-8d6d-c70437c9405f/f941cb68-9f13-418c-be1b-702e5927b1eb': Device or resource busy
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)