You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Jonathan Boulle (JIRA)" <ji...@apache.org> on 2013/04/10 00:16:16 UTC

[jira] [Created] (MESOS-430) send better messages on executor failures

Jonathan Boulle created MESOS-430:
-------------------------------------

             Summary: send better messages on executor failures
                 Key: MESOS-430
                 URL: https://issues.apache.org/jira/browse/MESOS-430
             Project: Mesos
          Issue Type: Improvement
            Reporter: Jonathan Boulle
            Priority: Minor


When an executor fails during launch, the slave marks the task as LOST but doesn't return any useful information. It would be a lot more helpful if the slave could include, in the corresponding status update, an indication that the executor failed (e.g. like the "has terminated" message in the log).

In this specific case, libprocess is failing to bind (presumably because of port exhaustion), which causes the executor to abort before the driver is initialised:
{code:title=executor stderr}
F0409 20:53:27.887141 41188 process.cpp:1315] Failed to initialize, bind: Address already in use [98]
*** Check failure stack trace: ***
{code}

and in the slave log:
{code}
I0409 20:53:26.558866 21107 slave.cpp:475] Got assigned task 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 for framework 20110
4070004-0000002563-0000
I0409 20:53:26.560374 21107 paths.hpp:235] Created executor directory '/var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/2011040700
04-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1'
I0409 20:53:26.561908 21089 cgroups_isolation_module.cpp:440] Launching thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 
(./thermos_executor) in /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805
543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1 with resources cpus=0.25; mem=128 for framework 
201104070004-0000002563-0000 in cgroup mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46
ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:26.562155 21091 slave.cpp:361] Successfully attached file '/var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/2011040700
04-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1'
I0409 20:53:26.563926 21089 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor thermos-1365540805543-sathya-service-proxy-0-31e97933-
3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000 with resources cpus=0.25; mem=128
I0409 20:53:26.564266 21089 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 256 for executor thermos-1365540805543-sathya-service-proxy-0-31e979
33-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
I0409 20:53:26.564595 21089 cgroups_isolation_module.cpp:774] Updated 'memory.limit_in_bytes' to 134217728 for executor thermos-1365540805543-sathya-servi
ce-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
I0409 20:53:26.565055 21089 cgroups_isolation_module.cpp:800] Started listening for OOM events for executor thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
0409 20:53:26.567622 21089 cgroups_isolation_module.cpp:469] Forked executor at = 41188
Fetching resources into /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1
Fetching resource /usr/local/bin/thermos_executor
Copying resource from /usr/local/bin/thermos_executor to .2013-04-09 20:53:26,696:21086(0x4d6b6940):ZOO_DEBUG@zookeeper_process@1983: Got ping response in 0 ms
I0409 20:53:28.246511 21104 cgroups_isolation_module.cpp:633] Telling slave of terminated executor thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
I0409 20:53:28.246727 21104 cgroups_isolation_module.cpp:534] Killing executor thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
I0409 20:53:28.246750 21108 slave.cpp:1053] Executor 'thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02' of framework 201104070004-0000002563-0000 has terminated with signal Aborted
I0409 20:53:28.255456 21104 cgroups_isolation_module.cpp:819] OOM notifier is triggered for executor thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000 with tag b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.255533 21104 cgroups_isolation_module.cpp:824] Discarded OOM notifier for executor thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000 with tag b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.256477 21101 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.256595 21101 cgroups.cpp:1185] Successfully froze cgroup /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f after 1 attempts
I0409 20:53:28.261756 21097 cgroups.cpp:1161] Trying to thaw cgroup /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.261864 21097 cgroups.cpp:1268] Successfully thawed /cgroup/mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.285230 21108 slave.cpp:830] Status update: task 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000 is now in state TASK_LOST
I0409 20:53:28.285598 21104 cgroups_isolation_module.cpp:567] Asked to update resources for an unknown/killed executor
I0409 20:53:28.285810 21096 gc.cpp:97] Scheduling /var/lib/mesos/slaves/201303271650-1944527370-5050-24955-2587/frameworks/201104070004-0000002563-0000/executors/thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02/runs/cca3c221-999b-444e-b628-4b6354754ad1 for removal
I0409 20:53:28.296897 21094 cgroups_isolation_module.cpp:903] Successfully destroyed the cgroup mesos/framework_201104070004-0000002563-0000_executor_thermos-1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02_tag_b7d846d0-eabc-4f02-8614-601bbd18ef5f
I0409 20:53:28.298924 21091 slave.cpp:727] Got acknowledgement of status update for task 1365540805543-sathya-service-proxy-0-31e97933-3e9f-46ef-a1e7-30cd480e6b02 of framework 201104070004-0000002563-0000
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira