You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Marc Roos <M....@f1-outsourcing.eu> on 2020/06/07 19:56:27 UTC
problems running marathon >=1.8 on mesos
I am cross posting this to mesos-users, hoping someone has came accros
this issue, and can help me resolve this issue I have. There are several
JIRA issues open with similar symptoms.
All of a sudden I having problems with marathon ui getting stuck at
'loading' and end points like http://m01.local:8081/v2/info are not
responding (http://m01.local:8081/ping). I have now downgraded the test
cluster to one node, running only mesos-master and zookeeper and
marathon. Cleaning between tests the /var/lib/zookeeper and the
/var/lib/mesos directories. I have also removed many of the
configuration options I had, like ssl etc.
I am only able to get to run marathon-1.7.216-9e2a9b579.
marathon-1.8.222-86475ddac and marathon-1.10.17-c427ce965 are having the
above mentioned errors/problem.
I have been comparing the marathon 1.7 and marathon 1.8 logs and this
what I have noticed. There are quite a bit of log statements missing
between 'All services up and running.
(mesosphere.marathon.MarathonApp:main' and 'akka://marathon/deadLetters'
in the 1.8 log.
Anyone had something similar?
[@mesos-master]# rpm -qa | grep java
python-javapackages-3.4.1-11.el7.noarch
tzdata-java-2020a-1.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.252.b09-2.el7_8.x86_64
javapackages-tools-3.4.1-11.el7.noarch
[@mesos-master]# uname -a
Linux m01.local 3.10.0-1127.10.1.el7.x86_64 #1 SMP Wed Jun 3 14:28:03
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[@mesos-master]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
marathon 1.8 (unresponsive)
===========================
Jun 7 17:40:59 m01 marathon: [2020-06-07 17:40:59,696] INFO All
services up and running. (mesosphere.marathon.MarathonApp:main)
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,833] INFO initiate
task reconciliation
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-9)
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,854] INFO Requesting
task reconciliation with the Mesos master
(mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
Jun 7 17:41:13 m01 mesos-master[11203]: I0607 17:41:13.858621 11227
master.cpp:8846] Performing implicit task state reconciliation for
framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon) at
scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,864] INFO task
reconciliation has finished
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-4)
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,879] INFO Message
[mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from
Actor[akka://marathon/user/MarathonScheduler/$a#1746491390] to
Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters
encountered. If this is not an expected behavior, then
[Actor[akka://marathon/deadLetters]] may have terminated unexpectedly,
This logging can be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
(akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-7)
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,910] INFO Prompting
Mesos for a heartbeat via explicit task reconciliation
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-7)
Jun 7 17:41:13 m01 mesos-master[11203]: I0607 17:41:13.914615 11228
master.cpp:8889] Performing explicit task state reconciliation for 1
tasks of framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon)
at scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun 7 17:41:13 m01 marathon: [2020-06-07 17:41:13,924] INFO Received
fake heartbeat task-status update
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-13)
Jun 7 17:41:28 m01 marathon: [2020-06-07 17:41:28,939] INFO Prompting
Mesos for a heartbeat via explicit task reconciliation
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-4)
Jun 7 17:41:28 m01 mesos-master[11203]: I0607 17:41:28.946494 11229
master.cpp:8889] Performing explicit task state reconciliation for 1
tasks of framework f5d67e06-6600-4fb9-94dc-a878be2563be-0000 (marathon)
at scheduler-6d98d1e0-a7d2-4517-a0ce-5819a36414c9@192.168.10.151:36941
Jun 7 17:41:28 m01 marathon: [2020-06-07 17:41:28,950] INFO Received
fake heartbeat task-status update
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-14)
marathon 1.7 (ok)
=================
Jun 7 17:37:02 m01 marathon: [2020-06-07 17:37:02,681] INFO All
services up and running. (mesosphere.marathon.MarathonApp:main)
Jun 7 17:37:06 m01 marathon: [2020-06-07 17:37:06,222] INFO Received
TimedCheck
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:06 m01 marathon: [2020-06-07 17:37:06,228] INFO => revive
offers NOW, canceling any scheduled revives
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:06 m01 mesos-master[10661]: I0607 17:37:06.232568 10690
master.cpp:5521] Processing REVIVE call for framework
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun 7 17:37:06 m01 mesos-master[10661]: I0607 17:37:06.232730 10690
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun 7 17:37:06 m01 marathon: [2020-06-07 17:37:06,235] INFO 2 further
revives still needed. Repeating reviveOffers according to
--revive_offers_repetitions 3
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:06 m01 marathon: [2020-06-07 17:37:06,238] INFO =>
Schedule next revive at 2020-06-07T15:37:11.228Z in 4990 milliseconds,
adhering to --min_revive_offers_interval 5000 (ms)
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:11 m01 marathon: [2020-06-07 17:37:11,240] INFO Received
TimedCheck
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun 7 17:37:11 m01 mesos-master[10661]: I0607 17:37:11.246363 10685
master.cpp:5521] Processing REVIVE call for framework
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun 7 17:37:11 m01 mesos-master[10661]: I0607 17:37:11.246500 10685
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun 7 17:37:11 m01 marathon: [2020-06-07 17:37:11,240] INFO => revive
offers NOW, canceling any scheduled revives
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun 7 17:37:11 m01 marathon: [2020-06-07 17:37:11,241] INFO 1 further
revives still needed. Repeating reviveOffers according to
--revive_offers_repetitions 3
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun 7 17:37:11 m01 marathon: [2020-06-07 17:37:11,241] INFO =>
Schedule next revive at 2020-06-07T15:37:16.240Z in 4999 milliseconds,
adhering to --min_revive_offers_interval 5000 (ms)
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-5)
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,261] INFO Received
TimedCheck
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.265516 10689
master.cpp:5521] Processing REVIVE call for framework
f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun 7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.265655 10689
hierarchical.cpp:1788] Unsuppressed offers and cleared filters for roles
{ * } of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,261] INFO => revive
offers NOW, canceling any scheduled revives
(mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.acto
r.default-dispatcher-8)
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,409] INFO initiate
task reconciliation
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-5)
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,437] INFO Requesting
task reconciliation with the Mesos master
(mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
Jun 7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.441344 10686
master.cpp:8846] Performing implicit task state reconciliation for
framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon) at
scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,444] INFO task
reconciliation has finished
(mesosphere.marathon.MarathonSchedulerActor:marathon-akka.actor.default-
dispatcher-2)
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,459] INFO Message
[mesosphere.marathon.MarathonSchedulerActor$TasksReconciled$] from
Actor[akka://marathon/user/MarathonScheduler/$a#-463341905] to
Actor[akka://marathon/deadLetters] was not delivered. [1] dead letters
encountered. If this is not an expected behavior, then
[Actor[akka://marathon/deadLetters]] may have terminated unexpectedly,
This logging can be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
(akka.actor.DeadLetterActorRef:marathon-akka.actor.default-dispatcher-8)
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,502] INFO Prompting
Mesos for a heartbeat via explicit task reconciliation
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor$$anon$1:marath
on-akka.actor.default-dispatcher-5)
Jun 7 17:37:16 m01 mesos-master[10661]: I0607 17:37:16.506299 10687
master.cpp:8889] Performing explicit task state reconciliation for 1
tasks of framework f2318310-8c7b-438c-9a9d-48fdf1cd0406-0000 (marathon)
at scheduler-44c3a1b5-3c08-4fdd-ae79-6a1fd172e3b5@192.168.10.151:40447
Jun 7 17:37:16 m01 marathon: [2020-06-07 17:37:16,513] INFO Received
fake heartbeat task-status update
(mesosphere.marathon.core.heartbeat.MesosHeartbeatMonitor:Thread-14)
Jun 7 17:37:31 m01 marathon: [2020-06-07 17:37:31,012] INFO Killing
overdue instances:
(mesosphere.marathon.core.task.jobs.impl.OverdueInstancesActor$Support:s
cala-execution-context-global-54)
Jun 7 17:37:31 m01 marathon: [2020-06-07 17:37:31,018] INFO Kill and
forget following instances for reason Overdue:
(mesosphere.marathon.core.task.termination.impl.KillServ