You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Jean-Baptiste <jb...@gmail.com> on 2017/10/17 16:16:54 UTC

Marathon HA issue

Hi there,

This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
We now facing a situation with our 3 "*master */ *marathon*" nodes cluster.
The H/A is broken on the cluster, the only way to make "*Marathon*" works
is to put the leader on the same host than "*Mesos*" leader.

If the `Marathon` leader is on different host than "*Mesos"* leader, "
*Marathon*" enter in a re-registration loop. Does someone already faced
this kind of behavior? An idea of where to search? Could it be linked to this
issue <https://jira.mesosphere.com/browse/MARATHON-7436>?

We've checked the network security, there is no specific restriction
between the hosts.

*Versions:*

   - *Debian*: 8.7
   - *Mesos*: 1.3.0
   - *Marathon*: 1.4.5


*Topology:*

**Thanks!

-- 

Jean-Baptiste FAREZ

jbfarez@gmail.com

Re: Marathon 1.5 necessary for testing with cni

Posted by Avinash Sridharan <av...@mesosphere.io>.

Hi Marc,
The Marathon app def:
""networks": [ { "mode": "container", "name": "mynet" } ],"

is the new API that go introduced in 1.5.

Prior to 1.5 you would do:
"ipAddress": [{"networkName":"mynet"}]

@jdef ^^?


On Tue, Oct 17, 2017 at 1:24 PM, Marc Roos <M....@f1-outsourcing.eu> wrote:

>
> I want to test a bit with mesos, docker images on the mesos
> containerizer and cni. Just give a container an ip.
>
> This is not possible with marathon <1.5? I am using the mesosphere repo
> for el7, is there some repo that has 1.5?
>
>
> I tried to add this to the container configuration
>
> "networks": [ { "mode": "container", "name": "mynet" } ],
>
>
> [@m03 ~]# cat /etc/mesos-cni/10-mynet.conf
> {
>      "cniVersion": "0.2.0",
>      "name": "mynet",
>      "type": "bridge",
>      "bridge": "cni0",
>      "isGateway": true,
>      "ipMasq": true,
>      "ipam": {
>            "type": "host-local",
>            "subnet": "10.22.0.0/16",
>            "routes": [
>                  { "dst": "0.0.0.0/0" }
>            ]
>      }
> }
>
>
> On the github page of marathon
> (https://github.com/mesosphere/marathon/blob/master/docs/docs/networking
> .md) they are writing 1.5 is necessary for this?
>
> More info I got from this demo https://youtu.be/0UMCoojACOs?t=1411
>
>
> CentOS7 3.10.0-693.2.2.el7.x86_64
> mesos-1.4.0-2.0.1.x86_64
> marathon-1.4.8-1.0.660.el7.x86_64
> containernetworking-cni-0.5.1-1.el7.x86_64
> mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_6
>
>


-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245

Re: Marathon HA issue

Posted by Jean-Baptiste <jb...@gmail.com>.

Thanks Greg, I've joined the logs in attachement. For info the options we
are using are the following:

   - *default_accepted_resource_roles* = *
   - *hostname* = 10.0.x.x
   - *local_port_max* = 19999
   - *local_port_min* = 17000
   - *mesos_authentication_principal* = username
   - *mesos_authentication_secret_file* = /a/secure/place/secret
   - *mesos_role* = marathon-global
   - *zk* = zk://10.0.13.49:2181,10.0.25.213:2181,10.0.41.8:2181/marathon

I already tried on Mesos Slack (#marathon chan), but it doesn't seems to be
my lucky day :) . Nonetheless I'll try on Google group, thanks!

Cheers.


2017-10-17 19:01 GMT+02:00 Greg Mann <gr...@mesosphere.io>:

> Hi Jean-Baptiste,
> It would be helpful if you could include some Marathon and Mesos master
> logs to aid in troubleshooting. The fact that you're only experiencing the
> issue when Mesos/Marathon leaders are not co-located makes me suspect a
> network configuration issue, but it's hard to say without more evidence.
>
> Since this is a Marathon-specific issue, you may also have some luck
> reaching out on the Marathon Google group [1] or the #marathon channel on
> Mesos Slack [2].
>
> Cheers,
> Greg
>
> [1] https://groups.google.com/forum/#!forum/marathon-framework
> [2] http://mesos.apache.org/community/
>
>
> On Tue, Oct 17, 2017 at 9:16 AM, Jean-Baptiste <jb...@gmail.com> wrote:
>
>> Hi there,
>>
>> This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
>> We now facing a situation with our 3 "*master */ *marathon*" nodes
>> cluster. The H/A is broken on the cluster, the only way to make "
>> *Marathon*" works is to put the leader on the same host than "*Mesos*"
>> leader.
>>
>> If the `Marathon` leader is on different host than "*Mesos"* leader, "
>> *Marathon*" enter in a re-registration loop. Does someone already faced
>> this kind of behavior? An idea of where to search? Could it be linked to this
>> issue <https://jira.mesosphere.com/browse/MARATHON-7436>?
>>
>> We've checked the network security, there is no specific restriction
>> between the hosts.
>>
>> *Versions:*
>>
>>    - *Debian*: 8.7
>>    - *Mesos*: 1.3.0
>>    - *Marathon*: 1.4.5
>>
>>
>> *Topology:*
>>
>> **Thanks!
>>
>> --
>>
>> Jean-Baptiste FAREZ
>>
>> jbfarez@gmail.com
>>
>
>


-- 

Jean-Baptiste FAREZ

jbfarez@gmail.com

Marathon 1.5 necessary for testing with cni

Posted by Marc Roos <M....@f1-outsourcing.eu>.

 
I want to test a bit with mesos, docker images on the mesos 
containerizer and cni. Just give a container an ip.

This is not possible with marathon <1.5? I am using the mesosphere repo 
for el7, is there some repo that has 1.5?


I tried to add this to the container configuration

"networks": [ { "mode": "container", "name": "mynet" } ],


[@m03 ~]# cat /etc/mesos-cni/10-mynet.conf
{
     "cniVersion": "0.2.0",
     "name": "mynet",
     "type": "bridge",
     "bridge": "cni0",
     "isGateway": true,
     "ipMasq": true,
     "ipam": {
           "type": "host-local",
           "subnet": "10.22.0.0/16",
           "routes": [
                 { "dst": "0.0.0.0/0" }
           ]
     }
}


On the github page of marathon 
(https://github.com/mesosphere/marathon/blob/master/docs/docs/networking
.md) they are writing 1.5 is necessary for this?

More info I got from this demo https://youtu.be/0UMCoojACOs?t=1411


CentOS7 3.10.0-693.2.2.el7.x86_64
mesos-1.4.0-2.0.1.x86_64
marathon-1.4.8-1.0.660.el7.x86_64
containernetworking-cni-0.5.1-1.el7.x86_64
mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_6

Re: Marathon HA issue

Posted by Greg Mann <gr...@mesosphere.io>.

Hi Jean-Baptiste,
It would be helpful if you could include some Marathon and Mesos master
logs to aid in troubleshooting. The fact that you're only experiencing the
issue when Mesos/Marathon leaders are not co-located makes me suspect a
network configuration issue, but it's hard to say without more evidence.

Since this is a Marathon-specific issue, you may also have some luck
reaching out on the Marathon Google group [1] or the #marathon channel on
Mesos Slack [2].

Cheers,
Greg

[1] https://groups.google.com/forum/#!forum/marathon-framework
[2] http://mesos.apache.org/community/

On Tue, Oct 17, 2017 at 9:16 AM, Jean-Baptiste <jb...@gmail.com> wrote:

> Hi there,
>
> This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
> We now facing a situation with our 3 "*master */ *marathon*" nodes
> cluster. The H/A is broken on the cluster, the only way to make "
> *Marathon*" works is to put the leader on the same host than "*Mesos*"
> leader.
>
> If the `Marathon` leader is on different host than "*Mesos"* leader, "
> *Marathon*" enter in a re-registration loop. Does someone already faced
> this kind of behavior? An idea of where to search? Could it be linked to this
> issue <https://jira.mesosphere.com/browse/MARATHON-7436>?
>
> We've checked the network security, there is no specific restriction
> between the hosts.
>
> *Versions:*
>
>    - *Debian*: 8.7
>    - *Mesos*: 1.3.0
>    - *Marathon*: 1.4.5
>
>
> *Topology:*
>
> **Thanks!
>
> --
>
> Jean-Baptiste FAREZ
>
> jbfarez@gmail.com
>