You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Shuai Lin <li...@gmail.com> on 2016/01/20 07:42:26 UTC

About MESOS-1806 (Etcd as an alternative to zookeeper)

Hi Benjamin and all,

I'd like to talk about MESOS-1806. Since I took this ticket from halfway,
and there was no design doc for it, I have created one based on the current
implementation.

https://docs.google.com/document/d/1ccY0XJoOODpIiGPllSVvl7t-YRrIEE_NavfbZHKPWBs/edit?usp=sharing

Besides, there some details I'd like to discuss:

1. Etcd servers wound't accept requests from clients during the leader
election phase. So when there is a leader re-election among the etcd
servers, the request from the current master to renew the timestamp of the
v2/keys/mesos node would fail, and the current code would immediately retry
with the next server, which would refuse the request as well. Thus the
master would exit due to all servers fail its requests. The same happens
with slaves – detector would fail after requests to all the etcd servers
are refused. To solve this, we should add logic to wait for a while before
trying the next server.

2. If the the current master somehow fails to update the v2/keys/mesos node
in time, that node would expire, the detector would detect this, commit
suicide due to lost of leadership. This is correct behavior, but the
current TTL is kind of small: only 5 seconds, and the current master is set
to update the node at 80% of the TTL, i.e. the 4th second, so the chance of
this problem is not that low, e.g. if there happens ephemeral network
problem. This can be achieved by increase the TTL to 10 seconds, and let
the current master try to update the etcd node at 60% of the TTL.

3. The current implementation requires the list of masters to be specified
in the "--masters=..." flag (used in the replicated logs quorum), this
makes it inconvenient to add new masters to the cluster: every existing
master must be restarted with updated "--masters=" flag. What about create
a directory in etcd key space, and let each master create a child node in
that directory?

Regards,
Shuai

Re: About MESOS-1806 (Etcd as an alternative to zookeeper)

Posted by tommy xiao <xi...@gmail.com>.

I am curious if the zookeeper have the same behavior and issue. do we can
setup a metrics to compare the issue with etcd vs zookeeper. it will driver
us to define the correct scope.

2016-01-20 14:42 GMT+08:00 Shuai Lin <li...@gmail.com>:

> Hi Benjamin and all,
>
> I'd like to talk about MESOS-1806. Since I took this ticket from halfway,
> and there was no design doc for it, I have created one based on the current
> implementation.
>
>
> https://docs.google.com/document/d/1ccY0XJoOODpIiGPllSVvl7t-YRrIEE_NavfbZHKPWBs/edit?usp=sharing
>
> Besides, there some details I'd like to discuss:
>
>
> 1. Etcd servers wound't accept requests from clients during the leader
> election phase. So when there is a leader re-election among the etcd
> servers, the request from the current master to renew the timestamp of the
> v2/keys/mesos node would fail, and the current code would immediately retry
> with the next server, which would refuse the request as well. Thus the
> master would exit due to all servers fail its requests. The same happens
> with slaves – detector would fail after requests to all the etcd servers
> are refused. To solve this, we should add logic to wait for a while before
> trying the next server.
>
> 2. If the the current master somehow fails to update the v2/keys/mesos node
> in time, that node would expire, the detector would detect this, commit
> suicide due to lost of leadership. This is correct behavior, but the
> current TTL is kind of small: only 5 seconds, and the current master is set
> to update the node at 80% of the TTL, i.e. the 4th second, so the chance of
> this problem is not that low, e.g. if there happens ephemeral network
> problem. This can be achieved by increase the TTL to 10 seconds, and let
> the current master try to update the etcd node at 60% of the TTL.
>
> 3. The current implementation requires the list of masters to be specified
> in the "--masters=..." flag (used in the replicated logs quorum), this
> makes it inconvenient to add new masters to the cluster: every existing
> master must be restarted with updated "--masters=" flag. What about create
> a directory in etcd key space, and let each master create a child node in
> that directory?
>
>
> Regards,
> Shuai
>



-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com