You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Marc Roos <M....@f1-outsourcing.eu> on 2020/06/07 19:56:27 UTC

problems running marathon >=1.8 on mesos

I am cross posting this to mesos-users, hoping someone has came accros 
this issue, and can help me resolve this issue I have. There are several 
JIRA issues open with similar symptoms.


All of a sudden I having problems with marathon ui getting stuck at 
'loading' and end points like http://m01.local:8081/v2/info are not 
responding (http://m01.local:8081/ping). I have now downgraded the test 
cluster to one node, running only mesos-master and zookeeper and 
marathon. Cleaning between tests the /var/lib/zookeeper and the 
/var/lib/mesos directories. I have also removed many of the 
configuration options I had, like ssl etc.

I am only able to get to run marathon-1.7.216-9e2a9b579. 
marathon-1.8.222-86475ddac and marathon-1.10.17-c427ce965 are having the 
above mentioned errors/problem.

I have been comparing the marathon 1.7 and marathon 1.8 logs and this 
what I have noticed. There are quite a bit of log statements missing 
between 'All services up and running. 
(mesosphere.marathon.MarathonApp:main' and 'akka://marathon/deadLetters' 
in the 1.8 log.

Anyone had something similar?


[@mesos-master]# rpm -qa  | grep java
python-javapackages-3.4.1-11.el7.noarch
tzdata-java-2020a-1.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.252.b09-2.el7_8.x86_64
javapackages-tools-3.4.1-11.el7.noarch

[@mesos-master]# uname -a
Linux m01.local 3.10.0-1127.10.1.el7.x86_64 #1 SMP Wed Jun 3 14:28:03 
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[@mesos-master]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)





marathon 1.8 (unresponsive)
===========================
Jun  7 17:40:59 m01 marathon: [2020-06-07 17:40:59,696] INFO  All 
services up and running. (mesosphere.marathon.MarathonApp:main)
Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,833] INFO  initiate 
task reconciliation 
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-9)
Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,854] INFO  Requesting 
task reconciliation with the Mesos master 
(mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
Jun  7 17:41:13 m01 mesos-master[11203]: I0607 17:41:13.858621 11227 
master.cpp:8846] Performing implicit task state reconciliation for 
framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon) at 
scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,864] INFO  task 
reconciliation has finished 
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-4)

Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,879] INFO  Message 
[mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from 
Actor[akka://marathon/user/MarathonScheduler/$a#1746491390] to 
Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters 
encountered. If this is not an expected behavior, then 
[Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, 
This logging can be turned off or adjusted with configuration settings 
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 
(akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-7)
Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,910] INFO  Prompting 
Mesos for a heartbeat via explicit task reconciliation 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-7)
Jun  7 17:41:13 m01 mesos-master[11203]: I0607 17:41:13.914615 11228 
master.cpp:8889] Performing explicit task state reconciliation for 1 
tasks of framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon) 
at scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun  7 17:41:13 m01 marathon: [2020-06-07 17:41:13,924] INFO  Received 
fake heartbeat task-status update 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-13)
Jun  7 17:41:28 m01 marathon: [2020-06-07 17:41:28,939] INFO  Prompting 
Mesos for a heartbeat via explicit task reconciliation 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-4)
Jun  7 17:41:28 m01 mesos-master[11203]: I0607 17:41:28.946494 11229 
master.cpp:8889] Performing explicit task state reconciliation for 1 
tasks of framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon) 
at scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun  7 17:41:28 m01 marathon: [2020-06-07 17:41:28,950] INFO  Received 
fake heartbeat task-status update 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-14)


marathon 1.7 (ok)
=================
Jun  7 17:37:02 m01 marathon: [2020-06-07 17:37:02,681] INFO  All 
services up and running. (mesosphere.marathon.MarathonApp:main)
Jun  7 17:37:06 m01 marathon: [2020-06-07 17:37:06,222] INFO  Received 
TimedCheck 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:06 m01 marathon: [2020-06-07 17:37:06,228] INFO  => revive 
offers NOW, canceling any scheduled revives 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:06 m01 mesos-master[10661]: I0607 17:37:06.232568 10690 
master.cpp:5521] Processing REVIVE call for framework 
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at 
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun  7 17:37:06 m01 mesos-master[10661]: I0607 17:37:06.232730 10690 
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles 
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun  7 17:37:06 m01 marathon: [2020-06-07 17:37:06,235] INFO  2 further 
revives still needed. Repeating reviveOffers according to 
--revive_offers_repetitions 3 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:06 m01 marathon: [2020-06-07 17:37:06,238] INFO  => 
Schedule next revive at 2020-06-07T15:37:11.228Z in 4990 milliseconds, 
adhering to --min_revive_offers_interval 5000 (ms) 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:11 m01 marathon: [2020-06-07 17:37:11,240] INFO  Received 
TimedCheck 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun  7 17:37:11 m01 mesos-master[10661]: I0607 17:37:11.246363 10685 
master.cpp:5521] Processing REVIVE call for framework 
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at 
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun  7 17:37:11 m01 mesos-master[10661]: I0607 17:37:11.246500 10685 
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles 
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun  7 17:37:11 m01 marathon: [2020-06-07 17:37:11,240] INFO  => revive 
offers NOW, canceling any scheduled revives 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun  7 17:37:11 m01 marathon: [2020-06-07 17:37:11,241] INFO  1 further 
revives still needed. Repeating reviveOffers according to 
--revive_offers_repetitions 3 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun  7 17:37:11 m01 marathon: [2020-06-07 17:37:11,241] INFO  => 
Schedule next revive at 2020-06-07T15:37:16.240Z in 4999 milliseconds, 
adhering to --min_revive_offers_interval 5000 (ms) 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,261] INFO  Received 
TimedCheck 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.265516 10689 
master.cpp:5521] Processing REVIVE call for framework 
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at 
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun  7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.265655 10689 
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles 
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,261] INFO  => revive 
offers NOW, canceling any scheduled revives 
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,409] INFO  initiate 
task reconciliation 
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-5)
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,437] INFO  Requesting 
task reconciliation with the Mesos master 
(mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
Jun  7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.441344 10686 
master.cpp:8846] Performing implicit task state reconciliation for 
framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at 
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,444] INFO  task 
reconciliation has finished 
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-2)

Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,459] INFO  Message 
[mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from 
Actor[akka://marathon/user/MarathonScheduler/$a#-463341905] to 
Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters 
encountered. If this is not an expected behavior, then 
[Actor[akka://marathon/deadLetters]] may have terminated unexpectedly, 
This logging can be turned off or adjusted with configuration settings 
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 
(akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8)
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,502] INFO  Prompting 
Mesos for a heartbeat via explicit task reconciliation 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-5)
Jun  7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.506299 10687 
master.cpp:8889] Performing explicit task state reconciliation for 1 
tasks of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) 
at scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun  7 17:37:16 m01 marathon: [2020-06-07 17:37:16,513] INFO  Received 
fake heartbeat task-status update 
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-14)
Jun  7 17:37:31 m01 marathon: [2020-06-07 17:37:31,012] INFO  Killing 
overdue instances:  
(mesosphere.marathon.core.task.jobs.impl.OverdueInstancesActor$Support:s
cala-execution-context-global-54)
Jun  7 17:37:31 m01 marathon: [2020-06-07 17:37:31,018] INFO  Kill and 
forget following instances for reason Overdue:  
(mesosphere.marathon.core.task.termination.impl.KillServ