You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by ms209495 <je...@gmail.com> on 2013/11/26 09:54:44 UTC

Ensure there is one master

Hi,

ZooKeeper is an excellent system but the problem with ensuring only one
master among clients bothers me.

Lets have a look at the situation when network partition happen: there is
part A (majority), and part B (minority).
Lets assume that before network partition happened the master was connected
to part B.
After the network partition, part A will elect new ZooKeeper leader, and
there will be new master elected among clients connected to part A.
At this time there are two masters - old in part B, and new in part A.
The only solution I can think about to this problem, is to ensure that the
new master is inactive for some time - to ensure that the old master in this
time will detect that it is not connected to ZooKeeper quorum, and will
deactivate itself as a master.
This solution assumes that timers on these machines work correctly.
Is it possible to ensure only one master using ZooKeeper without timing
assumptions ?

Thanks,
Maciej



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Ensure there is one master

Posted by Cameron McKenzie <mc...@gmail.com>.

I'm not sure on the exact details of how the connection loss is detected,
but in my experience, it is detected very quickly. Regardless, there will
still be an ephemeral node indicating that the old 'master' that is
connected to part B that will hang around until the old 'master's ZooKeeper
session times out. No new leader will be elected until this ephemeral node
has disappeared, and the old 'master' will definitely know that its
connection is gone by this point.


On Wed, Nov 27, 2013 at 10:17 AM, Maciej <je...@gmail.com> wrote:

> Old master in part B will notice that the connection is broken after some
> time.
> New master can be elected in part A earlier.
>
> Let.s say that the system uses a shared disk (SAN).
> There is some area on the disk that must be used exclusively (let's
> name it EXC).
>
> During this short period of time:
>
> new master will receive a user request, will modify EXC area, and reply to
> user
> old master will also receive another user request, will read EXC data
> (during modification by new master) - reading inconsistent data, and
> will reply to user - with inconsistent data
>
> I wonder how to avoid such inconsistency.
>
>
> On 11/26/13, Cameron McKenzie [via zookeeper-user]
> <ml...@n2.nabble.com> wrote:
> >
> >
> > If I'm understanding your question correctly, you're worried that when
> the
> > current 'master' loses its connection to ZooKeeper, a new 'master' will
> be
> > elected and you will have 2 'master' nodes at the same time. As soon as
> you
> > lose a connection to ZooKeeper there are no guarantees about any of the
> > state that you're determining from it. When you lose the ZooKeeper
> > connection, your 'master' must assume that it is no longer a 'master'
> node
> > until it reconnects to ZooKeeper, at which point it will be able to work
> > out what's going on.
> >
> > If you look at Apache Curator, its implementation of the Leader latch
> > recipe handles this loss of connection and reestablishment.
> >
> > cheers
> > Cam
> >
> >
> > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
> >
> >> Thanks for the reply. I want to clarify one thing.
> >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
> >> I think about master election among these 20 nodes, that do not run
> >> consensus, but they use zookeeper service for master election.
> >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
> >> 'master'
> >> term for master in the System (20 nodes).
> >> Solution is described here:
> >> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
> >> would name it 'master' election, not 'leader' election), but I doubt if
> >> it
> >> works reliable without additional timing assumptions as I described in
> my
> >> previous post.
> >> Please consider my previous post in the context of the System that uses
> >> Zookeeper (not ZooKeeper itself).
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
> >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> >>
> >
> >
> >
> >
> > _______________________________________________
> > If you reply to this email, your message will be added to the discussion
> > below:
> >
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579377.html
> >
> > To unsubscribe from Ensure there is one master, visit
> >
> http://zookeeper-user.578899.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7579367&code=amV6ZG5pYUBnbWFpbC5jb218NzU3OTM2N3wtNjE5OTg1Mjcw
>
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579379.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: Ensure there is one master

Posted by Bryan Thompson <br...@systap.com>.

Only read on exc if you are the leader. This is a low throughput solution since your access to that region is globally single threaded. Suggest that you stage updates from exc to a region that is highly available for readers. Of course, staging must be atomic, bryan

> On Nov 26, 2013, at 6:18 PM, "Maciej" <je...@gmail.com> wrote:
> 
> Old master in part B will notice that the connection is broken after some time.
> New master can be elected in part A earlier.
> 
> Let.s say that the system uses a shared disk (SAN).
> There is some area on the disk that must be used exclusively (let's
> name it EXC).
> 
> During this short period of time:
> 
> new master will receive a user request, will modify EXC area, and reply to user
> old master will also receive another user request, will read EXC data
> (during modification by new master) - reading inconsistent data, and
> will reply to user - with inconsistent data
> 
> I wonder how to avoid such inconsistency.
> 
> 
> On 11/26/13, Cameron McKenzie [via zookeeper-user]
> <ml...@n2.nabble.com> wrote:
>> 
>> 
>> If I'm understanding your question correctly, you're worried that when the
>> current 'master' loses its connection to ZooKeeper, a new 'master' will be
>> elected and you will have 2 'master' nodes at the same time. As soon as you
>> lose a connection to ZooKeeper there are no guarantees about any of the
>> state that you're determining from it. When you lose the ZooKeeper
>> connection, your 'master' must assume that it is no longer a 'master' node
>> until it reconnects to ZooKeeper, at which point it will be able to work
>> out what's going on.
>> 
>> If you look at Apache Curator, its implementation of the Leader latch
>> recipe handles this loss of connection and reestablishment.
>> 
>> cheers
>> Cam
>> 
>> 
>>> On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
>>> 
>>> Thanks for the reply. I want to clarify one thing.
>>> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
>>> I think about master election among these 20 nodes, that do not run
>>> consensus, but they use zookeeper service for master election.
>>> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
>>> 'master'
>>> term for master in the System (20 nodes).
>>> Solution is described here:
>>> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
>>> would name it 'master' election, not 'leader' election), but I doubt if
>>> it
>>> works reliable without additional timing assumptions as I described in my
>>> previous post.
>>> Please consider my previous post in the context of the System that uses
>>> Zookeeper (not ZooKeeper itself).
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>> 
>> 
>> 
>> 
>> _______________________________________________
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579377.html
>> 
>> To unsubscribe from Ensure there is one master, visit
>> http://zookeeper-user.578899.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7579367&code=amV6ZG5pYUBnbWFpbC5jb218NzU3OTM2N3wtNjE5OTg1Mjcw
> 
> 
> 
> 
> --
> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579379.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Ensure there is one master

Posted by Maciej <je...@gmail.com>.

Old master in part B will notice that the connection is broken after some time.
New master can be elected in part A earlier.

Let.s say that the system uses a shared disk (SAN).
There is some area on the disk that must be used exclusively (let's
name it EXC).

During this short period of time:

new master will receive a user request, will modify EXC area, and reply to user
old master will also receive another user request, will read EXC data
(during modification by new master) - reading inconsistent data, and
will reply to user - with inconsistent data

I wonder how to avoid such inconsistency.


On 11/26/13, Cameron McKenzie [via zookeeper-user]
<ml...@n2.nabble.com> wrote:
>
>
> If I'm understanding your question correctly, you're worried that when the
> current 'master' loses its connection to ZooKeeper, a new 'master' will be
> elected and you will have 2 'master' nodes at the same time. As soon as you
> lose a connection to ZooKeeper there are no guarantees about any of the
> state that you're determining from it. When you lose the ZooKeeper
> connection, your 'master' must assume that it is no longer a 'master' node
> until it reconnects to ZooKeeper, at which point it will be able to work
> out what's going on.
>
> If you look at Apache Curator, its implementation of the Leader latch
> recipe handles this loss of connection and reestablishment.
>
> cheers
> Cam
>
>
> On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
>
>> Thanks for the reply. I want to clarify one thing.
>> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
>> I think about master election among these 20 nodes, that do not run
>> consensus, but they use zookeeper service for master election.
>> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
>> 'master'
>> term for master in the System (20 nodes).
>> Solution is described here:
>> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
>> would name it 'master' election, not 'leader' election), but I doubt if
>> it
>> works reliable without additional timing assumptions as I described in my
>> previous post.
>> Please consider my previous post in the context of the System that uses
>> Zookeeper (not ZooKeeper itself).
>>
>>
>>
>> --
>> View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>
>
>
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579377.html
>
> To unsubscribe from Ensure there is one master, visit
> http://zookeeper-user.578899.n2.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=7579367&code=amV6ZG5pYUBnbWFpbC5jb218NzU3OTM2N3wtNjE5OTg1Mjcw




--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579379.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Ensure there is one master

Posted by Alexander Shraer <sh...@gmail.com>.

there could be some period of time between the lease expiring at the
leader and the old master finding out.
This may be, for example, because the clock of the old master is
slower or behind the clock of the leader, so it may still think its
lease is valid.

btw, there are other ways to achieve even stronger guarantees, its
just that it will be more complex.

On Tue, Nov 26, 2013 at 4:57 PM, Cameron McKenzie
<mc...@gmail.com> wrote:
> Excuse my ignorance (I'm relatively new to ZK), but how does the accuracy
> of the clock affect this situation?
>
>
> On Wed, Nov 27, 2013 at 11:53 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> This is not necessarily true.  The old master may not have an accurate
>> clock.
>>
>> The ascending id idea that Alex mentions is a very nice way to put more
>> guarantees on the process.
>>
>>
>>
>> On Tue, Nov 26, 2013 at 2:58 PM, Alexander Shraer <sh...@gmail.com>
>> wrote:
>>
>> > Cameron's solution basically relies on additional timing assumptions
>> > as Maciej mentions in his question.
>> >
>> > One more thing you could do is to implement increasing generation ids
>> > for masters, and have clients in your system reject commands from a
>> > master if they already know that a master with a higher generation id
>> > was elected (either because they saw a command from the new master or
>> > because they got a notification from ZK). This way each client can
>> > only have a single master and goes forward in time.
>> >
>> > Alex
>> >
>> > On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie
>> > <mc...@gmail.com> wrote:
>> > > If I'm understanding your question correctly, you're worried that when
>> > the
>> > > current 'master' loses its connection to ZooKeeper, a new 'master' will
>> > be
>> > > elected and you will have 2 'master' nodes at the same time. As soon as
>> > you
>> > > lose a connection to ZooKeeper there are no guarantees about any of the
>> > > state that you're determining from it. When you lose the ZooKeeper
>> > > connection, your 'master' must assume that it is no longer a 'master'
>> > node
>> > > until it reconnects to ZooKeeper, at which point it will be able to
>> work
>> > > out what's going on.
>> > >
>> > > If you look at Apache Curator, its implementation of the Leader latch
>> > > recipe handles this loss of connection and reestablishment.
>> > >
>> > > cheers
>> > > Cam
>> > >
>> > >
>> > > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
>> > >
>> > >> Thanks for the reply. I want to clarify one thing.
>> > >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
>> > >> I think about master election among these 20 nodes, that do not run
>> > >> consensus, but they use zookeeper service for master election.
>> > >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
>> > >> 'master'
>> > >> term for master in the System (20 nodes).
>> > >> Solution is described here:
>> > >> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection(I
>> > >> would name it 'master' election, not 'leader' election), but I doubt
>> if
>> > it
>> > >> works reliable without additional timing assumptions as I described in
>> > my
>> > >> previous post.
>> > >> Please consider my previous post in the context of the System that
>> uses
>> > >> Zookeeper (not ZooKeeper itself).
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> View this message in context:
>> > >>
>> >
>> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
>> > >> Sent from the zookeeper-user mailing list archive at Nabble.com.
>> > >>
>> >
>>

Re: Ensure there is one master

Posted by Ted Dunning <te...@gmail.com>.

Generally things like heartbeating or estimating expiration of ephemeral
nodes requires reference to a clock.  As such, detection of network
partition or estimation of when an ephemeral node vanishes can go
arbitrarily wrong.




On Tue, Nov 26, 2013 at 4:57 PM, Cameron McKenzie <mc...@gmail.com>wrote:

> Excuse my ignorance (I'm relatively new to ZK), but how does the accuracy
> of the clock affect this situation?
>
>
> On Wed, Nov 27, 2013 at 11:53 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > This is not necessarily true.  The old master may not have an accurate
> > clock.
> >
> > The ascending id idea that Alex mentions is a very nice way to put more
> > guarantees on the process.
> >
> >
> >
> > On Tue, Nov 26, 2013 at 2:58 PM, Alexander Shraer <sh...@gmail.com>
> > wrote:
> >
> > > Cameron's solution basically relies on additional timing assumptions
> > > as Maciej mentions in his question.
> > >
> > > One more thing you could do is to implement increasing generation ids
> > > for masters, and have clients in your system reject commands from a
> > > master if they already know that a master with a higher generation id
> > > was elected (either because they saw a command from the new master or
> > > because they got a notification from ZK). This way each client can
> > > only have a single master and goes forward in time.
> > >
> > > Alex
> > >
> > > On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie
> > > <mc...@gmail.com> wrote:
> > > > If I'm understanding your question correctly, you're worried that
> when
> > > the
> > > > current 'master' loses its connection to ZooKeeper, a new 'master'
> will
> > > be
> > > > elected and you will have 2 'master' nodes at the same time. As soon
> as
> > > you
> > > > lose a connection to ZooKeeper there are no guarantees about any of
> the
> > > > state that you're determining from it. When you lose the ZooKeeper
> > > > connection, your 'master' must assume that it is no longer a 'master'
> > > node
> > > > until it reconnects to ZooKeeper, at which point it will be able to
> > work
> > > > out what's going on.
> > > >
> > > > If you look at Apache Curator, its implementation of the Leader latch
> > > > recipe handles this loss of connection and reestablishment.
> > > >
> > > > cheers
> > > > Cam
> > > >
> > > >
> > > > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
> > > >
> > > >> Thanks for the reply. I want to clarify one thing.
> > > >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
> > > >> I think about master election among these 20 nodes, that do not run
> > > >> consensus, but they use zookeeper service for master election.
> > > >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
> > > >> 'master'
> > > >> term for master in the System (20 nodes).
> > > >> Solution is described here:
> > > >>
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection(I
> > > >> would name it 'master' election, not 'leader' election), but I doubt
> > if
> > > it
> > > >> works reliable without additional timing assumptions as I described
> in
> > > my
> > > >> previous post.
> > > >> Please consider my previous post in the context of the System that
> > uses
> > > >> Zookeeper (not ZooKeeper itself).
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> View this message in context:
> > > >>
> > >
> >
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
> > > >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> > > >>
> > >
> >
>

Re: Ensure there is one master

Posted by Cameron McKenzie <mc...@gmail.com>.

Excuse my ignorance (I'm relatively new to ZK), but how does the accuracy
of the clock affect this situation?


On Wed, Nov 27, 2013 at 11:53 AM, Ted Dunning <te...@gmail.com> wrote:

> This is not necessarily true.  The old master may not have an accurate
> clock.
>
> The ascending id idea that Alex mentions is a very nice way to put more
> guarantees on the process.
>
>
>
> On Tue, Nov 26, 2013 at 2:58 PM, Alexander Shraer <sh...@gmail.com>
> wrote:
>
> > Cameron's solution basically relies on additional timing assumptions
> > as Maciej mentions in his question.
> >
> > One more thing you could do is to implement increasing generation ids
> > for masters, and have clients in your system reject commands from a
> > master if they already know that a master with a higher generation id
> > was elected (either because they saw a command from the new master or
> > because they got a notification from ZK). This way each client can
> > only have a single master and goes forward in time.
> >
> > Alex
> >
> > On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie
> > <mc...@gmail.com> wrote:
> > > If I'm understanding your question correctly, you're worried that when
> > the
> > > current 'master' loses its connection to ZooKeeper, a new 'master' will
> > be
> > > elected and you will have 2 'master' nodes at the same time. As soon as
> > you
> > > lose a connection to ZooKeeper there are no guarantees about any of the
> > > state that you're determining from it. When you lose the ZooKeeper
> > > connection, your 'master' must assume that it is no longer a 'master'
> > node
> > > until it reconnects to ZooKeeper, at which point it will be able to
> work
> > > out what's going on.
> > >
> > > If you look at Apache Curator, its implementation of the Leader latch
> > > recipe handles this loss of connection and reestablishment.
> > >
> > > cheers
> > > Cam
> > >
> > >
> > > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
> > >
> > >> Thanks for the reply. I want to clarify one thing.
> > >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
> > >> I think about master election among these 20 nodes, that do not run
> > >> consensus, but they use zookeeper service for master election.
> > >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
> > >> 'master'
> > >> term for master in the System (20 nodes).
> > >> Solution is described here:
> > >> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection(I
> > >> would name it 'master' election, not 'leader' election), but I doubt
> if
> > it
> > >> works reliable without additional timing assumptions as I described in
> > my
> > >> previous post.
> > >> Please consider my previous post in the context of the System that
> uses
> > >> Zookeeper (not ZooKeeper itself).
> > >>
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >>
> >
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
> > >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> > >>
> >
>

Re: Ensure there is one master

Posted by Ted Dunning <te...@gmail.com>.

This is not necessarily true.  The old master may not have an accurate
clock.

The ascending id idea that Alex mentions is a very nice way to put more
guarantees on the process.



On Tue, Nov 26, 2013 at 2:58 PM, Alexander Shraer <sh...@gmail.com> wrote:

> Cameron's solution basically relies on additional timing assumptions
> as Maciej mentions in his question.
>
> One more thing you could do is to implement increasing generation ids
> for masters, and have clients in your system reject commands from a
> master if they already know that a master with a higher generation id
> was elected (either because they saw a command from the new master or
> because they got a notification from ZK). This way each client can
> only have a single master and goes forward in time.
>
> Alex
>
> On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie
> <mc...@gmail.com> wrote:
> > If I'm understanding your question correctly, you're worried that when
> the
> > current 'master' loses its connection to ZooKeeper, a new 'master' will
> be
> > elected and you will have 2 'master' nodes at the same time. As soon as
> you
> > lose a connection to ZooKeeper there are no guarantees about any of the
> > state that you're determining from it. When you lose the ZooKeeper
> > connection, your 'master' must assume that it is no longer a 'master'
> node
> > until it reconnects to ZooKeeper, at which point it will be able to work
> > out what's going on.
> >
> > If you look at Apache Curator, its implementation of the Leader latch
> > recipe handles this loss of connection and reestablishment.
> >
> > cheers
> > Cam
> >
> >
> > On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
> >
> >> Thanks for the reply. I want to clarify one thing.
> >> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
> >> I think about master election among these 20 nodes, that do not run
> >> consensus, but they use zookeeper service for master election.
> >> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
> >> 'master'
> >> term for master in the System (20 nodes).
> >> Solution is described here:
> >> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
> >> would name it 'master' election, not 'leader' election), but I doubt if
> it
> >> works reliable without additional timing assumptions as I described in
> my
> >> previous post.
> >> Please consider my previous post in the context of the System that uses
> >> Zookeeper (not ZooKeeper itself).
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
> >> Sent from the zookeeper-user mailing list archive at Nabble.com.
> >>
>

Re: Ensure there is one master

Posted by Alexander Shraer <sh...@gmail.com>.

Cameron's solution basically relies on additional timing assumptions
as Maciej mentions in his question.

One more thing you could do is to implement increasing generation ids
for masters, and have clients in your system reject commands from a
master if they already know that a master with a higher generation id
was elected (either because they saw a command from the new master or
because they got a notification from ZK). This way each client can
only have a single master and goes forward in time.

Alex

On Tue, Nov 26, 2013 at 2:34 PM, Cameron McKenzie
<mc...@gmail.com> wrote:
> If I'm understanding your question correctly, you're worried that when the
> current 'master' loses its connection to ZooKeeper, a new 'master' will be
> elected and you will have 2 'master' nodes at the same time. As soon as you
> lose a connection to ZooKeeper there are no guarantees about any of the
> state that you're determining from it. When you lose the ZooKeeper
> connection, your 'master' must assume that it is no longer a 'master' node
> until it reconnects to ZooKeeper, at which point it will be able to work
> out what's going on.
>
> If you look at Apache Curator, its implementation of the Leader latch
> recipe handles this loss of connection and reestablishment.
>
> cheers
> Cam
>
>
> On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:
>
>> Thanks for the reply. I want to clarify one thing.
>> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
>> I think about master election among these 20 nodes, that do not run
>> consensus, but they use zookeeper service for master election.
>> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
>> 'master'
>> term for master in the System (20 nodes).
>> Solution is described here:
>> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
>> would name it 'master' election, not 'leader' election), but I doubt if it
>> works reliable without additional timing assumptions as I described in my
>> previous post.
>> Please consider my previous post in the context of the System that uses
>> Zookeeper (not ZooKeeper itself).
>>
>>
>>
>> --
>> View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>

Re: Ensure there is one master

Posted by Cameron McKenzie <mc...@gmail.com>.

If I'm understanding your question correctly, you're worried that when the
current 'master' loses its connection to ZooKeeper, a new 'master' will be
elected and you will have 2 'master' nodes at the same time. As soon as you
lose a connection to ZooKeeper there are no guarantees about any of the
state that you're determining from it. When you lose the ZooKeeper
connection, your 'master' must assume that it is no longer a 'master' node
until it reconnects to ZooKeeper, at which point it will be able to work
out what's going on.

If you look at Apache Curator, its implementation of the Leader latch
recipe handles this loss of connection and reestablishment.

cheers
Cam

On Wed, Nov 27, 2013 at 9:28 AM, ms209495 <je...@gmail.com> wrote:

> Thanks for the reply. I want to clarify one thing.
> I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
> I think about master election among these 20 nodes, that do not run
> consensus, but they use zookeeper service for master election.
> I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and
> 'master'
> term for master in the System (20 nodes).
> Solution is described here:
> http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
> would name it 'master' election, not 'leader' election), but I doubt if it
> works reliable without additional timing assumptions as I described in my
> previous post.
> Please consider my previous post in the context of the System that uses
> Zookeeper (not ZooKeeper itself).
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: Ensure there is one master

Posted by ms209495 <je...@gmail.com>.

Thanks for the reply. I want to clarify one thing.
I think about a System of 20 nodes, that uses ZooKeeper of 3 nodes.
I think about master election among these 20 nodes, that do not run
consensus, but they use zookeeper service for master election.
I used 'leader' term for a leeder in Zookeeper (among 3 nodes), and 'master'
term for master in the System (20 nodes).
Solution is described here:
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection (I
would name it 'master' election, not 'leader' election), but I doubt if it
works reliable without additional timing assumptions as I described in my
previous post.
Please consider my previous post in the context of the System that uses
Zookeeper (not ZooKeeper itself).



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367p7579376.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Ensure there is one master

Posted by Camille Fournier <ca...@apache.org>.

The master in part b is only a master so long as it has a quorum of
followers. It loses the quorum when the  network partitions and so will no
longer act as master from that moment.
Additionally the master actions all require a consensus to proceed. So even
if the master in section b doesn't know it is no longer the master no state
changing actions will be able to get a quorum of votes to proceed, so there
is no time in which it can improperly lead because it won't actually have
the votes to make changes.

C

C
On Nov 26, 2013 12:08 PM, "ms209495" <je...@gmail.com> wrote:

> Hi,
>
> ZooKeeper is an excellent system but the problem with ensuring only one
> master among clients bothers me.
>
> Lets have a look at the situation when network partition happen: there is
> part A (majority), and part B (minority).
> Lets assume that before network partition happened the master was connected
> to part B.
> After the network partition, part A will elect new ZooKeeper leader, and
> there will be new master elected among clients connected to part A.
> At this time there are two masters - old in part B, and new in part A.
> The only solution I can think about to this problem, is to ensure that the
> new master is inactive for some time - to ensure that the old master in
> this
> time will detect that it is not connected to ZooKeeper quorum, and will
> deactivate itself as a master.
> This solution assumes that timers on these machines work correctly.
> Is it possible to ensure only one master using ZooKeeper without timing
> assumptions ?
>
> Thanks,
> Maciej
>
>
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>

Re: Ensure there is one master

Posted by Ivan Kelly <iv...@apache.org>.

Since there is always a period between checking you are master and
performing an action as master, so can't guarantee that another node
hasn't taken mastership before you perform the action. However, if you
are using some shared storage for the state of the system, you can
block other masters from writing to it while you are master,
preventing split brain from occurring. 

The simplest way to do this is to store all your state in
zookeeper. With this, if B is partitioned away, it will not be able to
update the state. This won't scale too far though, as zk holds
everything in memory. It may work if the system isn't too big though.

Another solution is to use
Bookkeeper(http://zookeeper.apache.org/bookkeeper), or another shared
storage with fencing, to sequence the updates to your
state. Bookkeeper is a distributed write ahead log, which writes
entries to a quorum before responding to client. It has a fencing
mechanism which sends a 'fence' message to at least one node in each
quorum, blocking all further writes to that log. In your case, if a
node in B is master and is putting all state updates into a bookkeeper
log before applying them, and the there is a partition and a node in A
becomes master, A will fence B's log before applying any state updates
of it's own.

Yet another solution, though I don't know how well it would work, if
to use locks in NFS. If B is logging to a file on a SAN, it get
an exclusive lock on the file handle. This will block anyone else from
logging to it until B's NFS session goes away. I'm not sure how long
it takes for sessions to timeout though, or how widely implemented or
reliable this part of the NFS spec is though.

Hope this helps,

Ivan

On Tue, Nov 26, 2013 at 12:54:44AM -0800, ms209495 wrote:
> Hi,
> 
> ZooKeeper is an excellent system but the problem with ensuring only one
> master among clients bothers me.
> 
> Lets have a look at the situation when network partition happen: there is
> part A (majority), and part B (minority).
> Lets assume that before network partition happened the master was connected
> to part B.
> After the network partition, part A will elect new ZooKeeper leader, and
> there will be new master elected among clients connected to part A.
> At this time there are two masters - old in part B, and new in part A.
> The only solution I can think about to this problem, is to ensure that the
> new master is inactive for some time - to ensure that the old master in this
> time will detect that it is not connected to ZooKeeper quorum, and will
> deactivate itself as a master.
> This solution assumes that timers on these machines work correctly.
> Is it possible to ensure only one master using ZooKeeper without timing
> assumptions ?
> 
> Thanks,
> Maciej
> 
> 
> 
> --
> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Ensure-there-is-one-master-tp7579367.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.