You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Tomas Barton <ba...@gmail.com> on 2014/07/22 11:40:25 UTC

Mesos 0.19 registrar upgrade

Hi,

what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've tried
to read all documentation before doing actual upgrade, but I still don't
understand a few things.

What should be the quorum size?

The --help says that "It is imperative to set this value to be a majority
of masters i.e., quorum > (number of masters)/2"

I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3, right?

The recover.cpp says that: "we allow a replica in EMPTY status to become
VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
EMPTY status"
So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
from the mesos-master --help).

quorum=1, mesos-masters=1
quorum=2, mesos-masters=3
quorum=3, mesos-masters=5
quorum=4, mesos-masters=7

Is is possible to have non-even number of Mesos masters? or is it just a
bad idea?

With 4 masters I got into a situation when:

master 1:
I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
received a broadcasted recover request

master 2:
I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
received a broadcasted recover request

master 3:
I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
from a replica in STARTING status

master 4:
I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
received a broadcasted recover request
I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
from a replica in STARTING status
I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
from a replica in VOTING status
I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
from a replica in EMPTY status

And the election algorithm ends up in an endless loop. How can I recover
from this? Delete all replica logs from master disk? Start with quorum=1
and increment number of masters?

Thanks,
Tomas

Re: Mesos 0.19 registrar upgrade

Posted by Tomas Barton <ba...@gmail.com>.
Ok, thanks Ben! In would be nice to update documentation accordingly.

So, in 0.20 there might be a flag specifying total number of masters?


On 23 July 2014 00:13, Benjamin Mahler <be...@gmail.com> wrote:

> At the current time, you need an odd number of masters as there is an
> assumption built into the replicated that the number of masters = 2*quorum
> - 1. This assumption is present when bootstrapping the log from no data.
>
> To recover from this, you need to run an odd number of masters, and set
> your quorum correctly. For example, 3 masters with quorum 2, or 5 masters
> with quorum 3. It is safe to wipe the replica logs before doing this.
>
> There are some outstanding tickets to clean this up:
> https://issues.apache.org/jira/browse/MESOS-1465
> https://issues.apache.org/jira/browse/MESOS-1546
>
> We'd like to have the configuration be explicit about the total number of
> masters, so that the assumption need not be made.
>
>
> On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton <ba...@gmail.com>
> wrote:
>
>> Hi,
>>
>> what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've
>> tried to read all documentation before doing actual upgrade, but I still
>> don't understand a few things.
>>
>> What should be the quorum size?
>>
>> The --help says that "It is imperative to set this value to be a majority
>> of masters i.e., quorum > (number of masters)/2"
>>
>> I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3,
>> right?
>>
>> The recover.cpp says that: "we allow a replica in EMPTY status to become
>> VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
>> EMPTY status"
>> So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
>> from the mesos-master --help).
>>
>> quorum=1, mesos-masters=1
>> quorum=2, mesos-masters=3
>> quorum=3, mesos-masters=5
>> quorum=4, mesos-masters=7
>>
>> Is is possible to have non-even number of Mesos masters? or is it just a
>> bad idea?
>>
>> With 4 masters I got into a situation when:
>>
>> master 1:
>> I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
>> received a broadcasted recover request
>>
>> master 2:
>> I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
>> received a broadcasted recover request
>>
>> master 3:
>> I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
>> from a replica in STARTING status
>>
>> master 4:
>> I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
>> received a broadcasted recover request
>> I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
>> from a replica in STARTING status
>> I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
>> from a replica in VOTING status
>> I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
>> from a replica in EMPTY status
>>
>> And the election algorithm ends up in an endless loop. How can I recover
>> from this? Delete all replica logs from master disk? Start with quorum=1
>> and increment number of masters?
>>
>> Thanks,
>> Tomas
>>
>
>

Re: Mesos 0.19 registrar upgrade

Posted by Benjamin Mahler <be...@gmail.com>.
At the current time, you need an odd number of masters as there is an
assumption built into the replicated that the number of masters = 2*quorum
- 1. This assumption is present when bootstrapping the log from no data.

To recover from this, you need to run an odd number of masters, and set
your quorum correctly. For example, 3 masters with quorum 2, or 5 masters
with quorum 3. It is safe to wipe the replica logs before doing this.

There are some outstanding tickets to clean this up:
https://issues.apache.org/jira/browse/MESOS-1465
https://issues.apache.org/jira/browse/MESOS-1546

We'd like to have the configuration be explicit about the total number of
masters, so that the assumption need not be made.


On Tue, Jul 22, 2014 at 2:40 AM, Tomas Barton <ba...@gmail.com>
wrote:

> Hi,
>
> what is the best way to upgrade Mesos cluster from 0.18 to 0.19? I've
> tried to read all documentation before doing actual upgrade, but I still
> don't understand a few things.
>
> What should be the quorum size?
>
> The --help says that "It is imperative to set this value to be a majority
> of masters i.e., quorum > (number of masters)/2"
>
> I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3,
> right?
>
> The recover.cpp says that: "we allow a replica in EMPTY status to become
> VOTING immediately if it finds ALL (i.e., 2 * quorum - 1) replicas are in
> EMPTY status"
> So, with quorum = 3 I would need 5 Mesos masters (that's just not clear
> from the mesos-master --help).
>
> quorum=1, mesos-masters=1
> quorum=2, mesos-masters=3
> quorum=3, mesos-masters=5
> quorum=4, mesos-masters=7
>
> Is is possible to have non-even number of Mesos masters? or is it just a
> bad idea?
>
> With 4 masters I got into a situation when:
>
> master 1:
> I0722 11:35:40.708562 12689 replica.cpp:638] Replica in VOTING status
> received a broadcasted recover request
>
> master 2:
> I0722 11:36:37.593647  7754 replica.cpp:638] Replica in EMPTY status
> received a broadcasted recover request
>
> master 3:
> I0722 11:35:14.102762 26701 recover.cpp:188] Received a recover response
> from a replica in STARTING status
>
> master 4:
> I0722 11:35:54.284169 32056 replica.cpp:638] Replica in STARTING status
> received a broadcasted recover request
> I0722 11:35:54.284425 32050 recover.cpp:188] Received a recover response
> from a replica in STARTING status
> I0722 11:35:54.284788 32057 recover.cpp:188] Received a recover response
> from a replica in VOTING status
> I0722 11:35:54.285127 32050 recover.cpp:188] Received a recover response
> from a replica in EMPTY status
>
> And the election algorithm ends up in an endless loop. How can I recover
> from this? Delete all replica logs from master disk? Start with quorum=1
> and increment number of masters?
>
> Thanks,
> Tomas
>

Re: Mesos 0.19 registrar upgrade

Posted by Dick Davies <di...@hellooperator.net>.
On 22 July 2014 10:40, Tomas Barton <ba...@gmail.com> wrote:

> I have 4 Mesos masters, which would mean that quorum > 2 -> quorum=3, right?

Yes, that's right. 2 won't be enough.


> quorum=1, mesos-masters=1
> quorum=2, mesos-masters=3
> quorum=3, mesos-masters=5
> quorum=4, mesos-masters=7
>
> Is is possible to have non-even number of Mesos masters? or is it just a bad
> idea?

Yes, it's a bad idea since this change - it's always been a bad idea
to run an even
number of zookeepers and now that extends to the mesos masters.

4 masters gives you no extra redundancy over 3, and your likelihood of node loss
increases slightly (as you now have an extra server to potentially break).