You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Sergei Hanus (Jira)" <ji...@apache.org> on 2020/11/04 12:51:00 UTC
[jira] [Created] (MESOS-10197) One of processes gets incorrect
status after stopping and starting mesos-master and mesos-agent
simultaneously
Sergei Hanus created MESOS-10197:
------------------------------------
Summary: One of processes gets incorrect status after stopping and starting mesos-master and mesos-agent simultaneously
Key: MESOS-10197
URL: https://issues.apache.org/jira/browse/MESOS-10197
Project: Mesos
Issue Type: Bug
Reporter: Sergei Hanus
We are using mesos 1.8.0 together with marathon 1.7.50
We run several child services under marathon. When we stop and start all services (including mesos-master and mesos-agent) or simply reboot the server, usually everything is returning back to functional.
But, sometimes we observe, that one of child services is reported as healthy, but in fact there is no such process on the server. When we restart mesos-sgent once more, this child service appears as a process and actually starts working.
At the same time we observe the following message in agent log:
{code:java}
I1103 01:48:08.291822 6542 slave.cpp:5491] Killing un-reregistered executor 'ia-cloud_nexus.f09fb47b-1d66-11eb-ad1d-12962e9c065b' of framework a99f25dd-d176-4ffd-9351-e70a357c1872-0000 at executor(1)@10.100.5.141:36452
I1103 01:48:08.291896 6542 slave.cpp:7848] Finished recovery
{code}
What could be the reason of such behavior and how to avoid it? If this services' state is stuck somethere in agents' internal structures (metadata file on disk, or something like that) - hwo could we cleanup this state?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)