You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Josh Stone <pa...@gmail.com> on 2012/01/05 05:09:27 UTC

Use cases for ZooKeeper

I have a few use cases that I'm wondering if ZooKeeper would be suitable
for and would appreciate some feedback.

First use case: Distributing work to a cluster of nodes using consistent
hashing to ensure that messages of some type are consistently handled by
the same node. I haven't been able to find any info about ZooKeeper +
consistent hashing. Is anyone using it for this? A concern here would be
how to redistribute work as nodes come and go from the cluster.

Second use case: Distributed locking. I noticed that there's a recipe for
this on the ZooKeeper wiki. Is anyone doing this? Any issues? One concern
would be how to handle orphaned locks if a node that obtained a lock goes
down.

Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
messages to workers, can it be made to handle a node going down by
re-distributing the work to another node (perhaps messages that are not
ack'ed within a timeout are resent)?

Cheers,
Josh

Re: Use cases for ZooKeeper

Posted by Ted Dunning <te...@gmail.com>.

This severely limits the throughput of this kind of approach.

The pro is that you get quite a bit of fine-grained resiliency.

The micro-sharding of traffic approach gives you very high throughput
(easily high enough, for instance, to handle all of twitter's traffic).

The con for micro-sharding is that there isn't any reliable delivery baked
in so the sender basically has to wait for an ACK and try again if no
delivery happens.  If you depend on a single sink for each key range shard,
then you will not be able to send until a new recipient for that shard is
designated.  This could take session-expiration-time plus epsilon so you
have to be able to handle that much back pressure in the message queue.

An alternative would be to designate multiple sinks for each key range
shard.  That loses some of the coherency that I think you were after, but
if you strictly prioritize you can convert the cost of node failure from
significant back-pressure to slightly degraded coherency.  For a clean node
failure, you would get fast cutover, but with a flapping node you might get
some strange effects.  If a node started losing messages, you would also
get some bad effects there where the backups would get random bits of
traffic rather than all of the traffic.

CAP applies as always with these things.

On Thu, Jan 5, 2012 at 10:59 AM, Jordan Zimmerman <jz...@netflix.com>wrote:

> They're stored in ZooKeeper, so both. ZooKeeper backs everything to disk
> but keeps the entire DB in memory for performance.
>
> -JZ
>
> On 1/5/12 10:54 AM, "Josh Stone" <pa...@gmail.com> wrote:
>
> >Are the distributed queue and locks written to disk or can they be held in
> >memory?
> >
> >josh
> >
> >On Thu, Jan 5, 2012 at 10:02 AM, Jordan Zimmerman
> ><jz...@netflix.com>wrote:
> >
> >> Curator's queue handles a node going down (when you use setLockPath()).
> >> Curator will hold a lock for each message that is being processed. You
> >>can
> >> see the implementation in the method processWithLockSafety() here:
> >>
> >>
> https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/j
> >>av
> >> a/com/netflix/curator/framework/recipes/queue/DistributedQueue.java
> >>
> >> >Will a node going down still clear any distributed locks?
> >> Yes.
> >>
> >>
> >> -JZ
> >>
> >> On 1/5/12 9:56 AM, "Josh Stone" <pa...@gmail.com> wrote:
> >>
> >> >Yes, something like that with lock safety would satisfy my third use
> >>case.
> >> >
> >> >Some questions: Is the distributed queue effectively located by a
> >>single
> >> >z-node? What happens when that node goes down? Will a node going down
> >> >still
> >> >clear any distributed locks?
> >> >
> >> >Josh
> >> >
> >> >On Thu, Jan 5, 2012 at 9:41 AM, Jordan Zimmerman
> >> ><jz...@netflix.com>wrote:
> >> >
> >> >> FYI - Curator has a resilient message Queue:
> >> >> https://github.com/Netflix/curator/wiki/Distributed-Queue
> >> >>
> >> >> On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:
> >> >>
> >> >> >Third use case: Fault tolerance. If we utilized ZooKeeper to
> >>distribute
> >> >> >messages to workers, can it be made to handle a node going down by
> >> >> >re-distributing the work to another node (perhaps messages that are
> >>not
> >> >> >ack'ed within a timeout are resent)?
> >> >>
> >> >>
> >>
> >>
>
>

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

They're stored in ZooKeeper, so both. ZooKeeper backs everything to disk
but keeps the entire DB in memory for performance.

-JZ

On 1/5/12 10:54 AM, "Josh Stone" <pa...@gmail.com> wrote:

>Are the distributed queue and locks written to disk or can they be held in
>memory?
>
>josh
>
>On Thu, Jan 5, 2012 at 10:02 AM, Jordan Zimmerman
><jz...@netflix.com>wrote:
>
>> Curator's queue handles a node going down (when you use setLockPath()).
>> Curator will hold a lock for each message that is being processed. You
>>can
>> see the implementation in the method processWithLockSafety() here:
>> 
>>https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/j
>>av
>> a/com/netflix/curator/framework/recipes/queue/DistributedQueue.java
>>
>> >Will a node going down still clear any distributed locks?
>> Yes.
>>
>>
>> -JZ
>>
>> On 1/5/12 9:56 AM, "Josh Stone" <pa...@gmail.com> wrote:
>>
>> >Yes, something like that with lock safety would satisfy my third use
>>case.
>> >
>> >Some questions: Is the distributed queue effectively located by a
>>single
>> >z-node? What happens when that node goes down? Will a node going down
>> >still
>> >clear any distributed locks?
>> >
>> >Josh
>> >
>> >On Thu, Jan 5, 2012 at 9:41 AM, Jordan Zimmerman
>> ><jz...@netflix.com>wrote:
>> >
>> >> FYI - Curator has a resilient message Queue:
>> >> https://github.com/Netflix/curator/wiki/Distributed-Queue
>> >>
>> >> On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:
>> >>
>> >> >Third use case: Fault tolerance. If we utilized ZooKeeper to
>>distribute
>> >> >messages to workers, can it be made to handle a node going down by
>> >> >re-distributing the work to another node (perhaps messages that are
>>not
>> >> >ack'ed within a timeout are resent)?
>> >>
>> >>
>>
>>

Re: Use cases for ZooKeeper

Posted by Josh Stone <pa...@gmail.com>.

Are the distributed queue and locks written to disk or can they be held in
memory?

josh

On Thu, Jan 5, 2012 at 10:02 AM, Jordan Zimmerman <jz...@netflix.com>wrote:

> Curator's queue handles a node going down (when you use setLockPath()).
> Curator will hold a lock for each message that is being processed. You can
> see the implementation in the method processWithLockSafety() here:
> https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/jav
> a/com/netflix/curator/framework/recipes/queue/DistributedQueue.java
>
> >Will a node going down still clear any distributed locks?
> Yes.
>
>
> -JZ
>
> On 1/5/12 9:56 AM, "Josh Stone" <pa...@gmail.com> wrote:
>
> >Yes, something like that with lock safety would satisfy my third use case.
> >
> >Some questions: Is the distributed queue effectively located by a single
> >z-node? What happens when that node goes down? Will a node going down
> >still
> >clear any distributed locks?
> >
> >Josh
> >
> >On Thu, Jan 5, 2012 at 9:41 AM, Jordan Zimmerman
> ><jz...@netflix.com>wrote:
> >
> >> FYI - Curator has a resilient message Queue:
> >> https://github.com/Netflix/curator/wiki/Distributed-Queue
> >>
> >> On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:
> >>
> >> >Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
> >> >messages to workers, can it be made to handle a node going down by
> >> >re-distributing the work to another node (perhaps messages that are not
> >> >ack'ed within a timeout are resent)?
> >>
> >>
>
>

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

Curator's queue handles a node going down (when you use setLockPath()).
Curator will hold a lock for each message that is being processed. You can
see the implementation in the method processWithLockSafety() here:
https://github.com/Netflix/curator/blob/master/curator-recipes/src/main/jav
a/com/netflix/curator/framework/recipes/queue/DistributedQueue.java

>Will a node going down still clear any distributed locks?
Yes.

-JZ

On 1/5/12 9:56 AM, "Josh Stone" <pa...@gmail.com> wrote:

>Yes, something like that with lock safety would satisfy my third use case.
>
>Some questions: Is the distributed queue effectively located by a single
>z-node? What happens when that node goes down? Will a node going down
>still
>clear any distributed locks?
>
>Josh
>
>On Thu, Jan 5, 2012 at 9:41 AM, Jordan Zimmerman
><jz...@netflix.com>wrote:
>
>> FYI - Curator has a resilient message Queue:
>> https://github.com/Netflix/curator/wiki/Distributed-Queue
>>
>> On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:
>>
>> >Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
>> >messages to workers, can it be made to handle a node going down by
>> >re-distributing the work to another node (perhaps messages that are not
>> >ack'ed within a timeout are resent)?
>>
>>

Re: Use cases for ZooKeeper

Posted by Josh Stone <pa...@gmail.com>.

Yes, something like that with lock safety would satisfy my third use case.

Some questions: Is the distributed queue effectively located by a single
z-node? What happens when that node goes down? Will a node going down still
clear any distributed locks?

Josh

On Thu, Jan 5, 2012 at 9:41 AM, Jordan Zimmerman <jz...@netflix.com>wrote:

> FYI - Curator has a resilient message Queue:
> https://github.com/Netflix/curator/wiki/Distributed-Queue
>
> On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:
>
> >Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
> >messages to workers, can it be made to handle a node going down by
> >re-distributing the work to another node (perhaps messages that are not
> >ack'ed within a timeout are resent)?
>
>

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

FYI - Curator has a resilient message Queue:
https://github.com/Netflix/curator/wiki/Distributed-Queue

On 1/5/12 5:00 AM, "Inder Pall" <in...@gmail.com> wrote:

>Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
>messages to workers, can it be made to handle a node going down by
>re-distributing the work to another node (perhaps messages that are not
>ack'ed within a timeout are resent)?

Re: Use cases for ZooKeeper

Posted by Inder Pall <in...@gmail.com>.

Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
messages to workers, can it be made to handle a node going down by
re-distributing the work to another node (perhaps messages that are not
ack'ed within a timeout are resent)?

>>Third use-case is done by kafka(ZK based consumer) wherein new consumers
getting added/removed from group
notifies existing consumers(they release all their work) and redistribute
the work among themselves.

to me second use-case is a leader election/lock use-case. Curator looks
good for it as i tried the ZK lock recipe and it has some issues.. I am
planning to use it for leader election and some sample code is at -
https://github.com/InMobi/data-bus/blob/master/src/main/java/com/inmobi/databus/zookeeper/CuratorLeaderManager.java

I am not sure about first use-case.

On Thu, Jan 5, 2012 at 2:25 PM, Jordan Zimmerman <jz...@netflix.com>wrote:

> Care to work on it?
>
> On 1/5/12 12:50 AM, "Ted Dunning" <te...@gmail.com> wrote:
>
> >This pattern would make a nice addition to Curator, actually.  It comes up
> >repeatedly in different contexts.
>
>

-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

Sure - give me what you have and I'll port it to Curator.

On 1/12/12 6:18 PM, "Ted Dunning" <te...@gmail.com> wrote:

>I think I have a bit of it written already.
>
>It doesn't use Curator and I think you could simplify it substantially if
>you were to use it.  Would that help?
>
>On Thu, Jan 12, 2012 at 12:52 PM, Jordan Zimmerman
><jz...@netflix.com>wrote:
>
>> Ted - are you interested in writing this on top of Curator? If not, I'll
>> give it a whack.
>>
>> -JZ
>>
>> On 1/5/12 12:50 AM, "Ted Dunning" <te...@gmail.com> wrote:
>>
>> >Jordan, I don't think that leader election does what Josh wants.
>> >
>> >I don't think that consistent hashing is particularly good for that
>>either
>> >because the loss of one node causes the sequential state for lots of
>> >entities to move even among nodes that did not fail.
>> >
>> >What I would recommend is a variant of micro-sharding.  The key space
>>is
>> >divided into many micro-shards.  Then nodes that are alive claim the
>> >micro-shards using ephemerals and proceed as Josh described.  On loss
>>of a
>> >node, the shards that node was handling should be claimed by the
>>remaining
>> >nodes.  When a new node appears or new work appears, it is helpful to
>> >direct nodes to effect a hand-off of traffic.
>> >
>> >In my experience, the best way to implement shard balancing is with and
>> >external master instance much in the style of hbase or katta.  This
>> >external master can be exceedingly simple and only needs to wake up on
>> >various events like loss of a node or change in the set of live shards.
>> >It
>> >can also wake up at intervals if desired to backstop the normal
>> >notifications or to allow small changes for certain kinds of balancing.
>> > Typically, this only requires a few hundred lines of code.
>> >
>> >This external master can, of course, be run on multiple nodes and which
>> >master is in current control can be adjudicated with yet another leader
>> >election.
>> >
>> >You can view this as a package of many leader elections.  Or as
>> >discretized
>> >consistent hashing.  The distinctions are a bit subtle but are very
>> >important.  These include,
>> >
>> >- there is a clean division of control between the master which
>>determines
>> >who serves what and the nodes that do the serving
>> >
>> >- there is no herd effect because the master drives the assignments
>> >
>> >- node loss causes the minimum amount of change of assignments since no
>> >assignments to surviving nodes are disturbed.  This is a major win.
>> >
>> >- balancing is pretty good because there are many shards compared to
>>the
>> >number of nodes.
>> >
>> >- the balancing strategy is highly pluggable.
>> >
>> >This pattern would make a nice addition to Curator, actually.  It
>>comes up
>> >repeatedly in different contexts.
>> >
>> >On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman
>> ><jz...@netflix.com>wrote:
>> >
>> >> OK - so this is two options for doing the same thing. You use a
>>Leader
>> >> Election algorithm to make sure that only one node in the cluster is
>> >> operating on a work unit. Curator has an implementation (it's really
>> >>just
>> >> a distributed lock with a slightly different API).
>> >>
>> >> -JZ
>> >>
>> >> On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:
>> >>
>> >> >Thanks for the response. Comments below:
>> >> >
>> >> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
>> >> ><jz...@netflix.com>wrote:
>> >> >
>> >> >> Hi Josh,
>> >> >>
>> >> >> >Second use case: Distributed locking
>> >> >> This is one of the most common uses of ZooKeeper. There are many
>> >> >> implementations - one included with the ZK distro. Also, there is
>> >> >>Curator:
>> >> >> https://github.com/Netflix/curator
>> >> >>
>> >> >> >First use case: Distributing work to a cluster of nodes
>> >> >> This sounds feasible. If you give more details I and others on
>>this
>> >>list
>> >> >> can help more.
>> >> >>
>> >> >
>> >> >Sure. I basically want to handle race conditions where two commands
>> >>that
>> >> >operate on the same data are received by my cluster of znodes,
>> >> >concurrently. One approach is to lock on the data that is effected
>>by
>> >>the
>> >> >command (distributed lock). Another approach is make sure that all
>>of
>> >>the
>> >> >commands that operate on any set of data are routed to the same
>>node,
>> >> >where
>> >> >they can be processed serially using local synchronization.
>>Consistent
>> >> >hashing is an algorithm that can be used to select a node to handle
>>a
>> >> >message (where the inputs are the key to hash and the number of
>>nodes
>> >>in
>> >> >the cluster).
>> >> >
>> >> >There are various implementations for this floating around. I'm just
>> >> >interesting to know how this is working for anyone else.
>> >> >
>> >> >Josh
>> >> >
>> >> >
>> >> >>
>> >> >> -JZ
>> >> >>
>> >> >> ________________________________________
>> >> >> From: Josh Stone [pacesysjosh@gmail.com]
>> >> >> Sent: Wednesday, January 04, 2012 8:09 PM
>> >> >> To: user@zookeeper.apache.org
>> >> >> Subject: Use cases for ZooKeeper
>> >> >>
>> >> >> I have a few use cases that I'm wondering if ZooKeeper would be
>> >>suitable
>> >> >> for and would appreciate some feedback.
>> >> >>
>> >> >> First use case: Distributing work to a cluster of nodes using
>> >>consistent
>> >> >> hashing to ensure that messages of some type are consistently
>> >>handled by
>> >> >> the same node. I haven't been able to find any info about
>>ZooKeeper +
>> >> >> consistent hashing. Is anyone using it for this? A concern here
>> >>would be
>> >> >> how to redistribute work as nodes come and go from the cluster.
>> >> >>
>> >> >> Second use case: Distributed locking. I noticed that there's a
>>recipe
>> >> >>for
>> >> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
>> >> >>concern
>> >> >> would be how to handle orphaned locks if a node that obtained a
>>lock
>> >> >>goes
>> >> >> down.
>> >> >>
>> >> >> Third use case: Fault tolerance. If we utilized ZooKeeper to
>> >>distribute
>> >> >> messages to workers, can it be made to handle a node going down by
>> >> >> re-distributing the work to another node (perhaps messages that
>>are
>> >>not
>> >> >> ack'ed within a timeout are resent)?
>> >> >>
>> >> >> Cheers,
>> >> >> Josh
>> >> >>
>> >>
>> >>
>>
>>

Re: Use cases for ZooKeeper

Posted by Ted Dunning <te...@gmail.com>.

I think I have a bit of it written already.

It doesn't use Curator and I think you could simplify it substantially if
you were to use it.  Would that help?

On Thu, Jan 12, 2012 at 12:52 PM, Jordan Zimmerman
<jz...@netflix.com>wrote:

> Ted - are you interested in writing this on top of Curator? If not, I'll
> give it a whack.
>
> -JZ
>
> On 1/5/12 12:50 AM, "Ted Dunning" <te...@gmail.com> wrote:
>
> >Jordan, I don't think that leader election does what Josh wants.
> >
> >I don't think that consistent hashing is particularly good for that either
> >because the loss of one node causes the sequential state for lots of
> >entities to move even among nodes that did not fail.
> >
> >What I would recommend is a variant of micro-sharding.  The key space is
> >divided into many micro-shards.  Then nodes that are alive claim the
> >micro-shards using ephemerals and proceed as Josh described.  On loss of a
> >node, the shards that node was handling should be claimed by the remaining
> >nodes.  When a new node appears or new work appears, it is helpful to
> >direct nodes to effect a hand-off of traffic.
> >
> >In my experience, the best way to implement shard balancing is with and
> >external master instance much in the style of hbase or katta.  This
> >external master can be exceedingly simple and only needs to wake up on
> >various events like loss of a node or change in the set of live shards.
> >It
> >can also wake up at intervals if desired to backstop the normal
> >notifications or to allow small changes for certain kinds of balancing.
> > Typically, this only requires a few hundred lines of code.
> >
> >This external master can, of course, be run on multiple nodes and which
> >master is in current control can be adjudicated with yet another leader
> >election.
> >
> >You can view this as a package of many leader elections.  Or as
> >discretized
> >consistent hashing.  The distinctions are a bit subtle but are very
> >important.  These include,
> >
> >- there is a clean division of control between the master which determines
> >who serves what and the nodes that do the serving
> >
> >- there is no herd effect because the master drives the assignments
> >
> >- node loss causes the minimum amount of change of assignments since no
> >assignments to surviving nodes are disturbed.  This is a major win.
> >
> >- balancing is pretty good because there are many shards compared to the
> >number of nodes.
> >
> >- the balancing strategy is highly pluggable.
> >
> >This pattern would make a nice addition to Curator, actually.  It comes up
> >repeatedly in different contexts.
> >
> >On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman
> ><jz...@netflix.com>wrote:
> >
> >> OK - so this is two options for doing the same thing. You use a Leader
> >> Election algorithm to make sure that only one node in the cluster is
> >> operating on a work unit. Curator has an implementation (it's really
> >>just
> >> a distributed lock with a slightly different API).
> >>
> >> -JZ
> >>
> >> On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:
> >>
> >> >Thanks for the response. Comments below:
> >> >
> >> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
> >> ><jz...@netflix.com>wrote:
> >> >
> >> >> Hi Josh,
> >> >>
> >> >> >Second use case: Distributed locking
> >> >> This is one of the most common uses of ZooKeeper. There are many
> >> >> implementations - one included with the ZK distro. Also, there is
> >> >>Curator:
> >> >> https://github.com/Netflix/curator
> >> >>
> >> >> >First use case: Distributing work to a cluster of nodes
> >> >> This sounds feasible. If you give more details I and others on this
> >>list
> >> >> can help more.
> >> >>
> >> >
> >> >Sure. I basically want to handle race conditions where two commands
> >>that
> >> >operate on the same data are received by my cluster of znodes,
> >> >concurrently. One approach is to lock on the data that is effected by
> >>the
> >> >command (distributed lock). Another approach is make sure that all of
> >>the
> >> >commands that operate on any set of data are routed to the same node,
> >> >where
> >> >they can be processed serially using local synchronization. Consistent
> >> >hashing is an algorithm that can be used to select a node to handle a
> >> >message (where the inputs are the key to hash and the number of nodes
> >>in
> >> >the cluster).
> >> >
> >> >There are various implementations for this floating around. I'm just
> >> >interesting to know how this is working for anyone else.
> >> >
> >> >Josh
> >> >
> >> >
> >> >>
> >> >> -JZ
> >> >>
> >> >> ________________________________________
> >> >> From: Josh Stone [pacesysjosh@gmail.com]
> >> >> Sent: Wednesday, January 04, 2012 8:09 PM
> >> >> To: user@zookeeper.apache.org
> >> >> Subject: Use cases for ZooKeeper
> >> >>
> >> >> I have a few use cases that I'm wondering if ZooKeeper would be
> >>suitable
> >> >> for and would appreciate some feedback.
> >> >>
> >> >> First use case: Distributing work to a cluster of nodes using
> >>consistent
> >> >> hashing to ensure that messages of some type are consistently
> >>handled by
> >> >> the same node. I haven't been able to find any info about ZooKeeper +
> >> >> consistent hashing. Is anyone using it for this? A concern here
> >>would be
> >> >> how to redistribute work as nodes come and go from the cluster.
> >> >>
> >> >> Second use case: Distributed locking. I noticed that there's a recipe
> >> >>for
> >> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
> >> >>concern
> >> >> would be how to handle orphaned locks if a node that obtained a lock
> >> >>goes
> >> >> down.
> >> >>
> >> >> Third use case: Fault tolerance. If we utilized ZooKeeper to
> >>distribute
> >> >> messages to workers, can it be made to handle a node going down by
> >> >> re-distributing the work to another node (perhaps messages that are
> >>not
> >> >> ack'ed within a timeout are resent)?
> >> >>
> >> >> Cheers,
> >> >> Josh
> >> >>
> >>
> >>
>
>

Re: Use cases for ZooKeeper

Posted by Josh Stone <pa...@gmail.com>.

We're thinking along the same lines. Specifically, I was thinking of using
a hash ring to minimize disruptions to the key space when nodes come and
go. Either that, or micro-sharding would be nice and I'm curious how this
has went with anyone else using ZooKeeper? I should mention, this is
basically an alternative to distributed locks. Both achieve the same thing
- protecting against race conditions.

Josh

On Thu, Jan 5, 2012 at 12:50 AM, Ted Dunning <te...@gmail.com> wrote:

> Jordan, I don't think that leader election does what Josh wants.
>
> I don't think that consistent hashing is particularly good for that either
> because the loss of one node causes the sequential state for lots of
> entities to move even among nodes that did not fail.
>
> What I would recommend is a variant of micro-sharding.  The key space is
> divided into many micro-shards.  Then nodes that are alive claim the
> micro-shards using ephemerals and proceed as Josh described.  On loss of a
> node, the shards that node was handling should be claimed by the remaining
> nodes.  When a new node appears or new work appears, it is helpful to
> direct nodes to effect a hand-off of traffic.
>
> In my experience, the best way to implement shard balancing is with and
> external master instance much in the style of hbase or katta.  This
> external master can be exceedingly simple and only needs to wake up on
> various events like loss of a node or change in the set of live shards.  It
> can also wake up at intervals if desired to backstop the normal
> notifications or to allow small changes for certain kinds of balancing.
>  Typically, this only requires a few hundred lines of code.
>
> This external master can, of course, be run on multiple nodes and which
> master is in current control can be adjudicated with yet another leader
> election.
>
> You can view this as a package of many leader elections.  Or as discretized
> consistent hashing.  The distinctions are a bit subtle but are very
> important.  These include,
>
> - there is a clean division of control between the master which determines
> who serves what and the nodes that do the serving
>
> - there is no herd effect because the master drives the assignments
>
> - node loss causes the minimum amount of change of assignments since no
> assignments to surviving nodes are disturbed.  This is a major win.
>
> - balancing is pretty good because there are many shards compared to the
> number of nodes.
>
> - the balancing strategy is highly pluggable.
>
> This pattern would make a nice addition to Curator, actually.  It comes up
> repeatedly in different contexts.
>
> On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman <jzimmerman@netflix.com
> >wrote:
>
> > OK - so this is two options for doing the same thing. You use a Leader
> > Election algorithm to make sure that only one node in the cluster is
> > operating on a work unit. Curator has an implementation (it's really just
> > a distributed lock with a slightly different API).
> >
> > -JZ
> >
> > On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:
> >
> > >Thanks for the response. Comments below:
> > >
> > >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
> > ><jz...@netflix.com>wrote:
> > >
> > >> Hi Josh,
> > >>
> > >> >Second use case: Distributed locking
> > >> This is one of the most common uses of ZooKeeper. There are many
> > >> implementations - one included with the ZK distro. Also, there is
> > >>Curator:
> > >> https://github.com/Netflix/curator
> > >>
> > >> >First use case: Distributing work to a cluster of nodes
> > >> This sounds feasible. If you give more details I and others on this
> list
> > >> can help more.
> > >>
> > >
> > >Sure. I basically want to handle race conditions where two commands that
> > >operate on the same data are received by my cluster of znodes,
> > >concurrently. One approach is to lock on the data that is effected by
> the
> > >command (distributed lock). Another approach is make sure that all of
> the
> > >commands that operate on any set of data are routed to the same node,
> > >where
> > >they can be processed serially using local synchronization. Consistent
> > >hashing is an algorithm that can be used to select a node to handle a
> > >message (where the inputs are the key to hash and the number of nodes in
> > >the cluster).
> > >
> > >There are various implementations for this floating around. I'm just
> > >interesting to know how this is working for anyone else.
> > >
> > >Josh
> > >
> > >
> > >>
> > >> -JZ
> > >>
> > >> ________________________________________
> > >> From: Josh Stone [pacesysjosh@gmail.com]
> > >> Sent: Wednesday, January 04, 2012 8:09 PM
> > >> To: user@zookeeper.apache.org
> > >> Subject: Use cases for ZooKeeper
> > >>
> > >> I have a few use cases that I'm wondering if ZooKeeper would be
> suitable
> > >> for and would appreciate some feedback.
> > >>
> > >> First use case: Distributing work to a cluster of nodes using
> consistent
> > >> hashing to ensure that messages of some type are consistently handled
> by
> > >> the same node. I haven't been able to find any info about ZooKeeper +
> > >> consistent hashing. Is anyone using it for this? A concern here would
> be
> > >> how to redistribute work as nodes come and go from the cluster.
> > >>
> > >> Second use case: Distributed locking. I noticed that there's a recipe
> > >>for
> > >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
> > >>concern
> > >> would be how to handle orphaned locks if a node that obtained a lock
> > >>goes
> > >> down.
> > >>
> > >> Third use case: Fault tolerance. If we utilized ZooKeeper to
> distribute
> > >> messages to workers, can it be made to handle a node going down by
> > >> re-distributing the work to another node (perhaps messages that are
> not
> > >> ack'ed within a timeout are resent)?
> > >>
> > >> Cheers,
> > >> Josh
> > >>
> >
> >
>

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

Care to work on it?

On 1/5/12 12:50 AM, "Ted Dunning" <te...@gmail.com> wrote:

>This pattern would make a nice addition to Curator, actually.  It comes up
>repeatedly in different contexts.

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

Ted - are you interested in writing this on top of Curator? If not, I'll
give it a whack.

-JZ

On 1/5/12 12:50 AM, "Ted Dunning" <te...@gmail.com> wrote:

>Jordan, I don't think that leader election does what Josh wants.
>
>I don't think that consistent hashing is particularly good for that either
>because the loss of one node causes the sequential state for lots of
>entities to move even among nodes that did not fail.
>
>What I would recommend is a variant of micro-sharding.  The key space is
>divided into many micro-shards.  Then nodes that are alive claim the
>micro-shards using ephemerals and proceed as Josh described.  On loss of a
>node, the shards that node was handling should be claimed by the remaining
>nodes.  When a new node appears or new work appears, it is helpful to
>direct nodes to effect a hand-off of traffic.
>
>In my experience, the best way to implement shard balancing is with and
>external master instance much in the style of hbase or katta.  This
>external master can be exceedingly simple and only needs to wake up on
>various events like loss of a node or change in the set of live shards.
>It
>can also wake up at intervals if desired to backstop the normal
>notifications or to allow small changes for certain kinds of balancing.
> Typically, this only requires a few hundred lines of code.
>
>This external master can, of course, be run on multiple nodes and which
>master is in current control can be adjudicated with yet another leader
>election.
>
>You can view this as a package of many leader elections.  Or as
>discretized
>consistent hashing.  The distinctions are a bit subtle but are very
>important.  These include,
>
>- there is a clean division of control between the master which determines
>who serves what and the nodes that do the serving
>
>- there is no herd effect because the master drives the assignments
>
>- node loss causes the minimum amount of change of assignments since no
>assignments to surviving nodes are disturbed.  This is a major win.
>
>- balancing is pretty good because there are many shards compared to the
>number of nodes.
>
>- the balancing strategy is highly pluggable.
>
>This pattern would make a nice addition to Curator, actually.  It comes up
>repeatedly in different contexts.
>
>On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman
><jz...@netflix.com>wrote:
>
>> OK - so this is two options for doing the same thing. You use a Leader
>> Election algorithm to make sure that only one node in the cluster is
>> operating on a work unit. Curator has an implementation (it's really
>>just
>> a distributed lock with a slightly different API).
>>
>> -JZ
>>
>> On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:
>>
>> >Thanks for the response. Comments below:
>> >
>> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
>> ><jz...@netflix.com>wrote:
>> >
>> >> Hi Josh,
>> >>
>> >> >Second use case: Distributed locking
>> >> This is one of the most common uses of ZooKeeper. There are many
>> >> implementations - one included with the ZK distro. Also, there is
>> >>Curator:
>> >> https://github.com/Netflix/curator
>> >>
>> >> >First use case: Distributing work to a cluster of nodes
>> >> This sounds feasible. If you give more details I and others on this
>>list
>> >> can help more.
>> >>
>> >
>> >Sure. I basically want to handle race conditions where two commands
>>that
>> >operate on the same data are received by my cluster of znodes,
>> >concurrently. One approach is to lock on the data that is effected by
>>the
>> >command (distributed lock). Another approach is make sure that all of
>>the
>> >commands that operate on any set of data are routed to the same node,
>> >where
>> >they can be processed serially using local synchronization. Consistent
>> >hashing is an algorithm that can be used to select a node to handle a
>> >message (where the inputs are the key to hash and the number of nodes
>>in
>> >the cluster).
>> >
>> >There are various implementations for this floating around. I'm just
>> >interesting to know how this is working for anyone else.
>> >
>> >Josh
>> >
>> >
>> >>
>> >> -JZ
>> >>
>> >> ________________________________________
>> >> From: Josh Stone [pacesysjosh@gmail.com]
>> >> Sent: Wednesday, January 04, 2012 8:09 PM
>> >> To: user@zookeeper.apache.org
>> >> Subject: Use cases for ZooKeeper
>> >>
>> >> I have a few use cases that I'm wondering if ZooKeeper would be
>>suitable
>> >> for and would appreciate some feedback.
>> >>
>> >> First use case: Distributing work to a cluster of nodes using
>>consistent
>> >> hashing to ensure that messages of some type are consistently
>>handled by
>> >> the same node. I haven't been able to find any info about ZooKeeper +
>> >> consistent hashing. Is anyone using it for this? A concern here
>>would be
>> >> how to redistribute work as nodes come and go from the cluster.
>> >>
>> >> Second use case: Distributed locking. I noticed that there's a recipe
>> >>for
>> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
>> >>concern
>> >> would be how to handle orphaned locks if a node that obtained a lock
>> >>goes
>> >> down.
>> >>
>> >> Third use case: Fault tolerance. If we utilized ZooKeeper to
>>distribute
>> >> messages to workers, can it be made to handle a node going down by
>> >> re-distributing the work to another node (perhaps messages that are
>>not
>> >> ack'ed within a timeout are resent)?
>> >>
>> >> Cheers,
>> >> Josh
>> >>
>>
>>

Re: Use cases for ZooKeeper

Posted by Ted Dunning <te...@gmail.com>.

Jordan, I don't think that leader election does what Josh wants.

I don't think that consistent hashing is particularly good for that either
because the loss of one node causes the sequential state for lots of
entities to move even among nodes that did not fail.

What I would recommend is a variant of micro-sharding.  The key space is
divided into many micro-shards.  Then nodes that are alive claim the
micro-shards using ephemerals and proceed as Josh described.  On loss of a
node, the shards that node was handling should be claimed by the remaining
nodes.  When a new node appears or new work appears, it is helpful to
direct nodes to effect a hand-off of traffic.

In my experience, the best way to implement shard balancing is with and
external master instance much in the style of hbase or katta.  This
external master can be exceedingly simple and only needs to wake up on
various events like loss of a node or change in the set of live shards.  It
can also wake up at intervals if desired to backstop the normal
notifications or to allow small changes for certain kinds of balancing.
 Typically, this only requires a few hundred lines of code.

This external master can, of course, be run on multiple nodes and which
master is in current control can be adjudicated with yet another leader
election.

You can view this as a package of many leader elections.  Or as discretized
consistent hashing.  The distinctions are a bit subtle but are very
important.  These include,

- there is a clean division of control between the master which determines
who serves what and the nodes that do the serving

- there is no herd effect because the master drives the assignments

- node loss causes the minimum amount of change of assignments since no
assignments to surviving nodes are disturbed.  This is a major win.

- balancing is pretty good because there are many shards compared to the
number of nodes.

- the balancing strategy is highly pluggable.

This pattern would make a nice addition to Curator, actually.  It comes up
repeatedly in different contexts.

On Thu, Jan 5, 2012 at 12:11 AM, Jordan Zimmerman <jz...@netflix.com>wrote:

> OK - so this is two options for doing the same thing. You use a Leader
> Election algorithm to make sure that only one node in the cluster is
> operating on a work unit. Curator has an implementation (it's really just
> a distributed lock with a slightly different API).
>
> -JZ
>
> On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:
>
> >Thanks for the response. Comments below:
> >
> >On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
> ><jz...@netflix.com>wrote:
> >
> >> Hi Josh,
> >>
> >> >Second use case: Distributed locking
> >> This is one of the most common uses of ZooKeeper. There are many
> >> implementations - one included with the ZK distro. Also, there is
> >>Curator:
> >> https://github.com/Netflix/curator
> >>
> >> >First use case: Distributing work to a cluster of nodes
> >> This sounds feasible. If you give more details I and others on this list
> >> can help more.
> >>
> >
> >Sure. I basically want to handle race conditions where two commands that
> >operate on the same data are received by my cluster of znodes,
> >concurrently. One approach is to lock on the data that is effected by the
> >command (distributed lock). Another approach is make sure that all of the
> >commands that operate on any set of data are routed to the same node,
> >where
> >they can be processed serially using local synchronization. Consistent
> >hashing is an algorithm that can be used to select a node to handle a
> >message (where the inputs are the key to hash and the number of nodes in
> >the cluster).
> >
> >There are various implementations for this floating around. I'm just
> >interesting to know how this is working for anyone else.
> >
> >Josh
> >
> >
> >>
> >> -JZ
> >>
> >> ________________________________________
> >> From: Josh Stone [pacesysjosh@gmail.com]
> >> Sent: Wednesday, January 04, 2012 8:09 PM
> >> To: user@zookeeper.apache.org
> >> Subject: Use cases for ZooKeeper
> >>
> >> I have a few use cases that I'm wondering if ZooKeeper would be suitable
> >> for and would appreciate some feedback.
> >>
> >> First use case: Distributing work to a cluster of nodes using consistent
> >> hashing to ensure that messages of some type are consistently handled by
> >> the same node. I haven't been able to find any info about ZooKeeper +
> >> consistent hashing. Is anyone using it for this? A concern here would be
> >> how to redistribute work as nodes come and go from the cluster.
> >>
> >> Second use case: Distributed locking. I noticed that there's a recipe
> >>for
> >> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
> >>concern
> >> would be how to handle orphaned locks if a node that obtained a lock
> >>goes
> >> down.
> >>
> >> Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
> >> messages to workers, can it be made to handle a node going down by
> >> re-distributing the work to another node (perhaps messages that are not
> >> ack'ed within a timeout are resent)?
> >>
> >> Cheers,
> >> Josh
> >>
>
>

Re: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

OK - so this is two options for doing the same thing. You use a Leader
Election algorithm to make sure that only one node in the cluster is
operating on a work unit. Curator has an implementation (it's really just
a distributed lock with a slightly different API).

-JZ

On 1/5/12 12:04 AM, "Josh Stone" <pa...@gmail.com> wrote:

>Thanks for the response. Comments below:
>
>On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman
><jz...@netflix.com>wrote:
>
>> Hi Josh,
>>
>> >Second use case: Distributed locking
>> This is one of the most common uses of ZooKeeper. There are many
>> implementations - one included with the ZK distro. Also, there is
>>Curator:
>> https://github.com/Netflix/curator
>>
>> >First use case: Distributing work to a cluster of nodes
>> This sounds feasible. If you give more details I and others on this list
>> can help more.
>>
>
>Sure. I basically want to handle race conditions where two commands that
>operate on the same data are received by my cluster of znodes,
>concurrently. One approach is to lock on the data that is effected by the
>command (distributed lock). Another approach is make sure that all of the
>commands that operate on any set of data are routed to the same node,
>where
>they can be processed serially using local synchronization. Consistent
>hashing is an algorithm that can be used to select a node to handle a
>message (where the inputs are the key to hash and the number of nodes in
>the cluster).
>
>There are various implementations for this floating around. I'm just
>interesting to know how this is working for anyone else.
>
>Josh
>
>
>>
>> -JZ
>>
>> ________________________________________
>> From: Josh Stone [pacesysjosh@gmail.com]
>> Sent: Wednesday, January 04, 2012 8:09 PM
>> To: user@zookeeper.apache.org
>> Subject: Use cases for ZooKeeper
>>
>> I have a few use cases that I'm wondering if ZooKeeper would be suitable
>> for and would appreciate some feedback.
>>
>> First use case: Distributing work to a cluster of nodes using consistent
>> hashing to ensure that messages of some type are consistently handled by
>> the same node. I haven't been able to find any info about ZooKeeper +
>> consistent hashing. Is anyone using it for this? A concern here would be
>> how to redistribute work as nodes come and go from the cluster.
>>
>> Second use case: Distributed locking. I noticed that there's a recipe
>>for
>> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One
>>concern
>> would be how to handle orphaned locks if a node that obtained a lock
>>goes
>> down.
>>
>> Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
>> messages to workers, can it be made to handle a node going down by
>> re-distributing the work to another node (perhaps messages that are not
>> ack'ed within a timeout are resent)?
>>
>> Cheers,
>> Josh
>>

Re: Use cases for ZooKeeper

Posted by Josh Stone <pa...@gmail.com>.

Thanks for the response. Comments below:

On Wed, Jan 4, 2012 at 10:46 PM, Jordan Zimmerman <jz...@netflix.com>wrote:

> Hi Josh,
>
> >Second use case: Distributed locking
> This is one of the most common uses of ZooKeeper. There are many
> implementations - one included with the ZK distro. Also, there is Curator:
> https://github.com/Netflix/curator
>
> >First use case: Distributing work to a cluster of nodes
> This sounds feasible. If you give more details I and others on this list
> can help more.
>

Sure. I basically want to handle race conditions where two commands that
operate on the same data are received by my cluster of znodes,
concurrently. One approach is to lock on the data that is effected by the
command (distributed lock). Another approach is make sure that all of the
commands that operate on any set of data are routed to the same node, where
they can be processed serially using local synchronization. Consistent
hashing is an algorithm that can be used to select a node to handle a
message (where the inputs are the key to hash and the number of nodes in
the cluster).

There are various implementations for this floating around. I'm just
interesting to know how this is working for anyone else.

Josh

>
> -JZ
>
> ________________________________________
> From: Josh Stone [pacesysjosh@gmail.com]
> Sent: Wednesday, January 04, 2012 8:09 PM
> To: user@zookeeper.apache.org
> Subject: Use cases for ZooKeeper
>
> I have a few use cases that I'm wondering if ZooKeeper would be suitable
> for and would appreciate some feedback.
>
> First use case: Distributing work to a cluster of nodes using consistent
> hashing to ensure that messages of some type are consistently handled by
> the same node. I haven't been able to find any info about ZooKeeper +
> consistent hashing. Is anyone using it for this? A concern here would be
> how to redistribute work as nodes come and go from the cluster.
>
> Second use case: Distributed locking. I noticed that there's a recipe for
> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One concern
> would be how to handle orphaned locks if a node that obtained a lock goes
> down.
>
> Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
> messages to workers, can it be made to handle a node going down by
> re-distributing the work to another node (perhaps messages that are not
> ack'ed within a timeout are resent)?
>
> Cheers,
> Josh
>

RE: Use cases for ZooKeeper

Posted by Jordan Zimmerman <jz...@netflix.com>.

Hi Josh,

>Second use case: Distributed locking
This is one of the most common uses of ZooKeeper. There are many implementations - one included with the ZK distro. Also, there is Curator: https://github.com/Netflix/curator

>First use case: Distributing work to a cluster of nodes
This sounds feasible. If you give more details I and others on this list can help more.

-JZ

________________________________________
From: Josh Stone [pacesysjosh@gmail.com]
Sent: Wednesday, January 04, 2012 8:09 PM
To: user@zookeeper.apache.org
Subject: Use cases for ZooKeeper

I have a few use cases that I'm wondering if ZooKeeper would be suitable
for and would appreciate some feedback.

First use case: Distributing work to a cluster of nodes using consistent
hashing to ensure that messages of some type are consistently handled by
the same node. I haven't been able to find any info about ZooKeeper +
consistent hashing. Is anyone using it for this? A concern here would be
how to redistribute work as nodes come and go from the cluster.

Second use case: Distributed locking. I noticed that there's a recipe for
this on the ZooKeeper wiki. Is anyone doing this? Any issues? One concern
would be how to handle orphaned locks if a node that obtained a lock goes
down.

Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
messages to workers, can it be made to handle a node going down by
re-distributing the work to another node (perhaps messages that are not
ack'ed within a timeout are resent)?

Cheers,
Josh

Re: Use cases for ZooKeeper

Posted by Inder Pall <in...@gmail.com>.

Josh,

i am trying to use the lock recipe for leader election and can share my
findings/sample code in sometime.
Regarding your query of orphaned locks  - these are sequential ephemeral
znodes which are automatically removed from ZK
once the session breaks i believe.



On Thu, Jan 5, 2012 at 9:39 AM, Josh Stone <pa...@gmail.com> wrote:

> I have a few use cases that I'm wondering if ZooKeeper would be suitable
> for and would appreciate some feedback.
>
> First use case: Distributing work to a cluster of nodes using consistent
> hashing to ensure that messages of some type are consistently handled by
> the same node. I haven't been able to find any info about ZooKeeper +
> consistent hashing. Is anyone using it for this? A concern here would be
> how to redistribute work as nodes come and go from the cluster.
>
> Second use case: Distributed locking. I noticed that there's a recipe for
> this on the ZooKeeper wiki. Is anyone doing this? Any issues? One concern
> would be how to handle orphaned locks if a node that obtained a lock goes
> down.
>
> Third use case: Fault tolerance. If we utilized ZooKeeper to distribute
> messages to workers, can it be made to handle a node going down by
> re-distributing the work to another node (perhaps messages that are not
> ack'ed within a timeout are resent)?
>
> Cheers,
> Josh
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub