You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Ian Downes (JIRA)" <ji...@apache.org> on 2014/06/17 23:37:08 UTC

[jira] [Updated] (MESOS-1154) Flaky SlaveRecoveryTest test: ReconcileKillTask and RecoverStatusUpdateManager.

     [ https://issues.apache.org/jira/browse/MESOS-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Downes updated MESOS-1154:
------------------------------

    Sprint: Q2 Sprint 4

> Flaky SlaveRecoveryTest test: ReconcileKillTask and RecoverStatusUpdateManager.
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-1154
>                 URL: https://issues.apache.org/jira/browse/MESOS-1154
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>            Assignee: Ian Downes
>
> Looks like the test tear down is failing to remove a cgroup in both cases:
> {noformat: title=SlaveRecoveryTest/0.ReconcileKillTask}
> [ RUN      ] SlaveRecoveryTest/0.ReconcileKillTask
> 2014-03-27 22:32:49,330:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0327 22:32:50.850909 53927 exec.cpp:131] Version: 0.19.0
> I0327 22:32:50.853888 53953 exec.cpp:205] Executor registered on slave 20140327-223247-1740121354-49087-44864-0
> Registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Starting task bc4f5f79-088e-4188-b9b5-3585ba5e6a98
> sh -c 'sleep 1000'
> Forked command at 53967
> 2014-03-27 22:32:52,667:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-03-27 22:32:56,004:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0327 22:32:56.032625 53957 exec.cpp:251] Received reconnect request from slave 20140327-223247-1740121354-49087-44864-0
> I0327 22:32:56.033253 53945 exec.cpp:228] Executor re-registered on slave 20140327-223247-1740121354-49087-44864-0
> Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Shutting down
> Sending SIGTERM to process tree at pid 53967
> Killing the following process trees:
> [
> --- 53967 sleep 1000
> ]
> Command terminated with signal Terminated (pid: 53967)
> 2014-03-27 22:32:59,341:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> ../../src/tests/mesos.cpp:387: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): 'mesos_test_cd6a76e9-9961-40ba-b0ed-b1190954975f/0b1f90b8-b5a2-43e9-851e-93ca56cb37d0' is not a valid cgroup
> [  FAILED  ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = mesos::internal::slave::MesosContainerizer (13600 ms)
> {noformat}
> {noformat: title=SlaveRecoveryTest/0.RecoverStatusUpdateManager}
> [ RUN      ] SlaveRecoveryTest/0.RecoverStatusUpdateManager
> 2014-03-27 22:30:12,509:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-03-27 22:30:15,845:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0327 22:30:18.038733 52597 exec.cpp:131] Version: 0.19.0
> I0327 22:30:18.043248 52620 exec.cpp:205] Executor registered on slave 20140327-223012-1740121354-49087-44864-0
> Registered executor on smfd-bkq-03-sr4.devel.twitter.com
> Starting task 4c8dda64-17e8-4390-8f31-f878cdde8228
> sh -c 'sleep 1000'
> Forked command at 52637
> 2014-03-27 22:30:19,182:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> 2014-03-27 22:30:22,519:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0327 22:30:24.087932 52634 exec.cpp:251] Received reconnect request from slave 20140327-223012-1740121354-49087-44864-0
> I0327 22:30:24.088851 52635 exec.cpp:228] Executor re-registered on slave 20140327-223012-1740121354-49087-44864-0
> Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
> 2014-03-27 22:30:25,855:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server refused to accept the client
> I0327 22:30:26.091655 52614 exec.cpp:378] Executor asked to shutdown
> Shutting down
> Sending SIGTERM to process tree at pid 52637
> Killing the following process trees:
> [
> --- 52637 sleep 1000
> ]
> Command terminated with signal Terminated (pid: 52637)
> ../../src/tests/mesos.cpp:387: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in nested cgroups: Collect failed: 'mesos_test_87c28b05-8055-407a-8ef5-eda7febd1f1c/302e57d9-e837-46cd-bd81-0e39c5b19564' is not a valid cgroup
> [  FAILED  ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = mesos::internal::slave::MesosContainerizer (16304 ms)
> {noformat}
> Seems to be flaky and only occurring sometimes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)