You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Anish Mashankar <an...@systeminsights.com> on 2016/05/12 07:24:22 UTC

Marathon Application fails randomly and restarts

Hello,
I have implemented a Kafka ES Connector by using the Kafka Connect Java
API. The java application is deployed as a docker container using Marathon
framework. There is no health check set up as of now.

The Marathon app fails every 2-3 minutes. Mesos Slave logs show the
following message repeatedly whenever the task fails:

E0509 11:40:11.918833  1231 slave.cpp:3252] Failed to update resources for
> container de5c1573-3fa3-4b3e-a560-de6491cf3c28 of executor
> 'infra_connect-kafka-es.0d1b37c4-15da-11e6-9215-02424ea4e672' running task
> infra_connect-kafka-es.0d1b37c4-15da-11e6-9215-02424ea4e672 on status
> update for terminal task, destroying container: Failed to determine cgroup
> for the 'cpu' subsystem: Failed to read /proc/20764/cgroup: Failed to open
> file '/proc/20764/cgroup': No such file or directory


I ran the Docker container on each slave without using Marathon and there
was no failure.
I have checked out the issue (
https://issues.apache.org/jira/browse/MESOS-1837) but the comments are not
so clear about a specific workflow to be adopted to fix this.

Did anyone else have a similar issue? If yes, please suggest me a
workaround for the same.

Thanking you

-- 
Anish Samir Mashankar

Re: Marathon Application fails randomly and restarts

Posted by Radoslaw Gruchalski <ra...@gruchalski.com>.

What is the exit code of the process? The stdout/stderr in the sandbox will tell you.
–
Best regards,
Radek Gruchalski
radek@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This communication is intended for the above-named person and may be confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender immediately.

On May 12, 2016 at 9:24:33 AM, Anish Mashankar (anish@systeminsights.com) wrote:

Hello,
I have implemented a Kafka ES Connector by using the Kafka Connect Java API. The java application is deployed as a docker container using Marathon framework. There is no health check set up as of now.

The Marathon app fails every 2-3 minutes. Mesos Slave logs show the following message repeatedly whenever the task fails:

E0509 11:40:11.918833 1231 slave.cpp:3252] Failed to update resources for container de5c1573-3fa3-4b3e-a560-de6491cf3c28 of executor 'infra_connect-kafka-es.0d1b37c4-15da-11e6-9215-02424ea4e672' running task infra_connect-kafka-es.0d1b37c4-15da-11e6-9215-02424ea4e672 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/20764/cgroup: Failed to open file '/proc/20764/cgroup': No such file or directory

I ran the Docker container on each slave without using Marathon and there was no failure.
I have checked out the issue (https://issues.apache.org/jira/browse/MESOS-1837) but the comments are not so clear about a specific workflow to be adopted to fix this.

Did anyone else have a similar issue? If yes, please suggest me a workaround for the same.

Thanking you

--
Anish Samir Mashankar