You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Jean-Baptiste <jb...@gmail.com> on 2017/10/17 16:16:54 UTC
Marathon HA issue
Hi there,
This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
We now facing a situation with our 3 "*master */ *marathon*" nodes cluster.
The H/A is broken on the cluster, the only way to make "*Marathon*" works
is to put the leader on the same host than "*Mesos*" leader.
If the `Marathon` leader is on different host than "*Mesos"* leader, "
*Marathon*" enter in a re-registration loop. Does someone already faced
this kind of behavior? An idea of where to search? Could it be linked to this
issue <https://jira.mesosphere.com/browse/MARATHON-7436>?
We've checked the network security, there is no specific restriction
between the hosts.
*Versions:*
- *Debian*: 8.7
- *Mesos*: 1.3.0
- *Marathon*: 1.4.5
*Topology:*
**Thanks!
--
Jean-Baptiste FAREZ
jbfarez@gmail.com
Re: Marathon 1.5 necessary for testing with cni
Posted by Avinash Sridharan <av...@mesosphere.io>.
Hi Marc,
The Marathon app def:
""networks": [ { "mode": "container", "name": "mynet" } ],"
is the new API that go introduced in 1.5.
Prior to 1.5 you would do:
"ipAddress": [{"networkName":"mynet"}]
@jdef ^^?
On Tue, Oct 17, 2017 at 1:24 PM, Marc Roos <M....@f1-outsourcing.eu> wrote:
>
> I want to test a bit with mesos, docker images on the mesos
> containerizer and cni. Just give a container an ip.
>
> This is not possible with marathon <1.5? I am using the mesosphere repo
> for el7, is there some repo that has 1.5?
>
>
> I tried to add this to the container configuration
>
> "networks": [ { "mode": "container", "name": "mynet" } ],
>
>
> [@m03 ~]# cat /etc/mesos-cni/10-mynet.conf
> {
> "cniVersion": "0.2.0",
> "name": "mynet",
> "type": "bridge",
> "bridge": "cni0",
> "isGateway": true,
> "ipMasq": true,
> "ipam": {
> "type": "host-local",
> "subnet": "10.22.0.0/16",
> "routes": [
> { "dst": "0.0.0.0/0" }
> ]
> }
> }
>
>
> On the github page of marathon
> (https://github.com/mesosphere/marathon/blob/master/docs/docs/networking
> .md) they are writing 1.5 is necessary for this?
>
> More info I got from this demo https://youtu.be/0UMCoojACOs?t=1411
>
>
> CentOS7 3.10.0-693.2.2.el7.x86_64
> mesos-1.4.0-2.0.1.x86_64
> marathon-1.4.8-1.0.660.el7.x86_64
> containernetworking-cni-0.5.1-1.el7.x86_64
> mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_6
>
>
--
Avinash Sridharan, Mesosphere
+1 (323) 702 5245
Re: Marathon HA issue
Posted by Jean-Baptiste <jb...@gmail.com>.
Thanks Greg, I've joined the logs in attachement. For info the options we
are using are the following:
- *default_accepted_resource_roles* = *
- *hostname* = 10.0.x.x
- *local_port_max* = 19999
- *local_port_min* = 17000
- *mesos_authentication_principal* = username
- *mesos_authentication_secret_file* = /a/secure/place/secret
- *mesos_role* = marathon-global
- *zk* = zk://10.0.13.49:2181,10.0.25.213:2181,10.0.41.8:2181/marathon
I already tried on Mesos Slack (#marathon chan), but it doesn't seems to be
my lucky day :) . Nonetheless I'll try on Google group, thanks!
Cheers.
2017-10-17 19:01 GMT+02:00 Greg Mann <gr...@mesosphere.io>:
> Hi Jean-Baptiste,
> It would be helpful if you could include some Marathon and Mesos master
> logs to aid in troubleshooting. The fact that you're only experiencing the
> issue when Mesos/Marathon leaders are not co-located makes me suspect a
> network configuration issue, but it's hard to say without more evidence.
>
> Since this is a Marathon-specific issue, you may also have some luck
> reaching out on the Marathon Google group [1] or the #marathon channel on
> Mesos Slack [2].
>
> Cheers,
> Greg
>
> [1] https://groups.google.com/forum/#!forum/marathon-framework
> [2] http://mesos.apache.org/community/
>
>
> On Tue, Oct 17, 2017 at 9:16 AM, Jean-Baptiste <jb...@gmail.com> wrote:
>
>> Hi there,
>>
>> This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
>> We now facing a situation with our 3 "*master */ *marathon*" nodes
>> cluster. The H/A is broken on the cluster, the only way to make "
>> *Marathon*" works is to put the leader on the same host than "*Mesos*"
>> leader.
>>
>> If the `Marathon` leader is on different host than "*Mesos"* leader, "
>> *Marathon*" enter in a re-registration loop. Does someone already faced
>> this kind of behavior? An idea of where to search? Could it be linked to this
>> issue <https://jira.mesosphere.com/browse/MARATHON-7436>?
>>
>> We've checked the network security, there is no specific restriction
>> between the hosts.
>>
>> *Versions:*
>>
>> - *Debian*: 8.7
>> - *Mesos*: 1.3.0
>> - *Marathon*: 1.4.5
>>
>>
>> *Topology:*
>>
>> **Thanks!
>>
>> --
>>
>> Jean-Baptiste FAREZ
>>
>> jbfarez@gmail.com
>>
>
>
--
Jean-Baptiste FAREZ
jbfarez@gmail.com
Marathon 1.5 necessary for testing with cni
Posted by Marc Roos <M....@f1-outsourcing.eu>.
I want to test a bit with mesos, docker images on the mesos
containerizer and cni. Just give a container an ip.
This is not possible with marathon <1.5? I am using the mesosphere repo
for el7, is there some repo that has 1.5?
I tried to add this to the container configuration
"networks": [ { "mode": "container", "name": "mynet" } ],
[@m03 ~]# cat /etc/mesos-cni/10-mynet.conf
{
"cniVersion": "0.2.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
On the github page of marathon
(https://github.com/mesosphere/marathon/blob/master/docs/docs/networking
.md) they are writing 1.5 is necessary for this?
More info I got from this demo https://youtu.be/0UMCoojACOs?t=1411
CentOS7 3.10.0-693.2.2.el7.x86_64
mesos-1.4.0-2.0.1.x86_64
marathon-1.4.8-1.0.660.el7.x86_64
containernetworking-cni-0.5.1-1.el7.x86_64
mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_6
Re: Marathon HA issue
Posted by Greg Mann <gr...@mesosphere.io>.
Hi Jean-Baptiste,
It would be helpful if you could include some Marathon and Mesos master
logs to aid in troubleshooting. The fact that you're only experiencing the
issue when Mesos/Marathon leaders are not co-located makes me suspect a
network configuration issue, but it's hard to say without more evidence.
Since this is a Marathon-specific issue, you may also have some luck
reaching out on the Marathon Google group [1] or the #marathon channel on
Mesos Slack [2].
Cheers,
Greg
[1] https://groups.google.com/forum/#!forum/marathon-framework
[2] http://mesos.apache.org/community/
On Tue, Oct 17, 2017 at 9:16 AM, Jean-Baptiste <jb...@gmail.com> wrote:
> Hi there,
>
> This morning we’ve rolled out changes on Marathon "*local_port_[min|max]*,
> We now facing a situation with our 3 "*master */ *marathon*" nodes
> cluster. The H/A is broken on the cluster, the only way to make "
> *Marathon*" works is to put the leader on the same host than "*Mesos*"
> leader.
>
> If the `Marathon` leader is on different host than "*Mesos"* leader, "
> *Marathon*" enter in a re-registration loop. Does someone already faced
> this kind of behavior? An idea of where to search? Could it be linked to this
> issue <https://jira.mesosphere.com/browse/MARATHON-7436>?
>
> We've checked the network security, there is no specific restriction
> between the hosts.
>
> *Versions:*
>
> - *Debian*: 8.7
> - *Mesos*: 1.3.0
> - *Marathon*: 1.4.5
>
>
> *Topology:*
>
> **Thanks!
>
> --
>
> Jean-Baptiste FAREZ
>
> jbfarez@gmail.com
>