You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Ben Bangert <be...@groovie.org> on 2012/05/17 02:50:03 UTC

The State of Python Zookeeper libraries and collaboration

It would seem that about 6 months or ago or so, there wasn't much out
there in terms of higher level Python libs for Zookeeper. There was the
Cloudera article on queues, and txzookeeper (which I'm sure many of us
not using twisted immediately ignored).

In the time since, several people including myself needed solutions
involving Zookeeper with Python and seeing nothing out there all
apparently began writing libraries (judging from the project timelines
in most cases). I've been collaborating with the author of zc.zk (Jim
Fulton) for awhile and we decided it'd make more sense to merge our
efforts. In this spirit I began contacting all the other developers to
gauge their interest and most have been interested.

I created a python-zk organization on GitHub to be the home for this
effort and moved over the zc.zk library (which people apparently had a
hard time locating), along with the fairly widely used staticly compiled
Python Zookeeper binding.

https://github.com/python-zk

Next up is to create the new merged core which I plan on basing mostly
around the cleanest implementation I have seen so far (which also
happens to be one of the only gevent compatible ones), kazoo. I've
talked with the primary author of Kazoo, and the name may remain with
the new merged package or it may get a new name if that doesn't work.
I'm not terribly tied to names as much as I am to solid, well tested,
well documented working code... but having catchy names does seem to help.

I'm currently working on this full-time, so I expect it to be in a
usable state in a week or so (hopefully not too optimistic). If you're
interested in helping out, the more the better, please feel free to
e-mail me directly or respond here.

This stuff is complex, it needs many eyes on it and lots of code review.

This hopefully explains why I'm so interested in having a single Python
Zookeeper library along similar caliber to Netflix's Curator that has:
- Very thorough unit/integration tests (100% coverage minimum)
- Cleanly handles connection loss
- Works under gevent or threaded/blocking
- Very well documented (API docs and narrative)
- Implements all the Zookeeper recipes
- Service Discovery/Management
- Higher level utility functions for common Zookeeper tasks

In the mean-time, here is a summary of my research efforts and code
review (if something isn't accurate, please feel free to correct).

Please don't take this as a critique, I'm just trying to document what
is out there for my own reference on merging and hopefully so other
people coming along don't continue to replicate this. :)


gevent-zookeeper
    - https://github.com/jrydberg/gevent-zookeeper/

    - Works under gevent
    - No tests
    - No documentation

kazoo
    - https://github.com/nimbusproject/kazoo

    - Resilient Client
    - Basic Lock (Uses UUID properly)
    - Some Tests (Integrated)
    - No documentation (doc strings only)
    - Works under gevent

pykeeper
    - https://github.com/nkvoll/pykeeper

    - Higher level client (not resiliant to errors)
    - Documentation
    - Some tests (Integrated)

txzookeeper
    - JuJu Team
    - https://launchpad.net/txzookeeper

    - Resilient Client
    - Doesn't handle create node edge-case
    - Basic Lock (open bug filed to handle the UUID bit)
    - Queue, ReliableQueue, SerializedQueue
    - No documentation (doc strings only)
    - Usable only from twisted
    - Well tested (Integrated)

twitter zookeeper lib
    -
https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper

    - Resilient Client
    - Handles create node edge-case
    - Service Registration/Discovery
    - Some documentation
    - Well tested (Integrated)
    - Tied to a lot of twitter commons code

zkpython (improvements to a fork of the official bindings)
    - https://github.com/duncf/zkpython/

    - Resilient Client
    - Basic Lock (Using unique id rather than UUID)
    - Handles create node edge-case
    - Some Tests (Integrated)
    - No additional docs

zc.zk
    - https://github.com/python-zk/zc.zk

    - Non-resilient Client (reconnects must be handled)
    - Higher level automatic watch functionality
    - Service Registration/Discovery
    - Well tested (Unit and Integration tests)
    - Documented (on usage, source code is missing doc strings)

zktools
    - https://github.com/mozilla-services/zktools

    - Relies on zc.zk
    - Shared Read/Write Locks
    - AsyncLock
    - Revokable Locks
    - Tests (Integrated)

zoop
    - https://github.com/davidmiller/zoop

    - Doesn't handle create node edge-case
    - Doesn't handle retryable exceptions
    - Revokable Lock (Doesn't handle create node edge-case, uses a permanent
                      node instead of ephemeral)
    - Tested (Unit tests via ZK mocks)
    - Well Documented (doc strings and narrative docs)


-- 
Ben Bangert
(ben@ || http://) groovie.org


Re: The State of Python Zookeeper libraries and collaboration

Posted by Duncan Findlay <du...@duncf.ca>.
On May 17, 2012, at 9:24 AM, Mark Gius wrote:

> Are you planning on writing a pure-python client (does not call out to the
> C bindings via zkpython) or are you planning on writing a solid wrapper
> around the C bindings. Implementing a pure-python client would go a long
> way towards making various green thread frameworks work without having to
> jump through hoops.  I think we'd have to add support to Jute so that it
> would generate python data classes kind of like it does now with Java and C.

One nice thing about the C bindings is that the communication with ZooKeeper happens in a separate C-thread. We have a number of applications that like to chew through all the available CPU. In these applications it's impossible to ensure that any individual Python thread gets scheduled frequently enough (e.g. to send PINGs to the server). So, personally I'd rather use the C bindings. ;-)

Duncan

Re: The State of Python Zookeeper libraries and collaboration

Posted by Patrick Hunt <ph...@apache.org>.
Ben this is cool -- please keep us posted on your progress!

Given the research you've done please consider updating the client
binding wiki page, in particular list your project.
https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings

Regards,

Patrick

On Thu, May 17, 2012 at 2:41 PM, Ben Bangert <be...@groovie.org> wrote:
> On 5/17/12 9:24 AM, Mark Gius wrote:
>> Are you planning on writing a pure-python client (does not call out to the
>> C bindings via zkpython) or are you planning on writing a solid wrapper
>> around the C bindings. Implementing a pure-python client would go a long
>> way towards making various green thread frameworks work without having to
>> jump through hoops.  I think we'd have to add support to Jute so that it
>> would generate python data classes kind of like it does now with Java and C.
>>
>> Assuming you go with a wrapper around the C bindings, I would suggest you
>> take a look at something called "xthread.py", which was a thread
>> synchronization primitives library that a guy proposed to the eventlet
>> project a while back and provides Lock, Notify, etc etc which are safe to
>> use and notify between real threads and green threads.  It gives a safe way
>> to deal with sending data and doing proper locks without having to worry
>> about calling out to "green" things from within non greened contexts (such
>> as the zk callback functions).  It's eventlet specific, but the concepts
>> and probably fair amount of the code can probably be adapted and use.
>>
>> Or pure-python.  That works too. :D
>
> I looked at writing a base zookeeper replacement that includes the
> higher level API's and utilizes ctypes to talk to an install zookeeper C
> binding rather than using the Python C binding. This has the advantage
> of working in PyPy, however it was quite a pain and probably has some
> slower performance than the Python C binding. This would be pure Python,
> but not quite in the way you're referring to as its not talking directly
> to Zookeeper using pure Python, but still using the C API.
>
> I'm mainly looking at having a higher level API that makes it easier to
> use Zookeeper in a less error-prone manner. Like Netflix's Curator, only
> with a Pythonic API since their API is fairly heavily grounded in Java
> limitations. So it'll have convenience methods, a consistent API that's
> usable under greenlets or threaded code, all the recipes, and well
> tested and documented.
>
> Lots of things using Zookeeper make the notion (for right or wrong) that
> watch events are executed sequentially (the C API does this for example
> as does the Java one AFAIK).
>
> To handle this my plan when using greenlets was to immediately spawn a
> greenlet watch processor during the ZK client initialization that would
> work off a normal non-gevent-patched Queue object, and the callbacks
> will drop a lambda onto the queue from the ZK thread. This ensures even
> in an async environment that by default all watch events are processed
> in the same order the ZK client receives them (a watch func could of
> course spawn a greenlet for itself, but at that point its already safely
> in the 'green context').
>
> The Kazoo author (David LaBissoniere) has written a small test script
> that verifies this approach appears to work. I'll want to test it under
> heavy load of course but it seems like a rather safe and sane approach.
> It also avoids a lot of the hairier pipe code that tries and shuttles
> things safely back and forth.
>
> If someone comes up with a pure Python client to Zookeeper, I'd be happy
> to work on supporting that as well but its a bit beyond the level of
> direct involvement I can provide.
>
> --
> Ben Bangert
> (ben@ || http://) groovie.org
>

Re: The State of Python Zookeeper libraries and collaboration

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.
My Jute changes are kindof hacky since not all the the types in Jute are used; I made the bare minimum of changes to get my requests and responses in Python.  I also had to make a number of hand changes to the generated Python classes, e.g.  I added the request header type codes to the packets.

My code, which is a very rough draft, can be found at 

https://github.com/maguro/pookeeper

BTW, I'm happy to take suggestions for a good name.  :)

As you can see I've only tested a small subset of requests.


Regards,
Alan

 
On May 18, 2012, at 10:11 AM, Mark Gius wrote:

> Are your patches to Jute available somewhere?
> 
> Mark
> 
> On Thu, May 17, 2012 at 8:24 PM, Alan D. Cabrera <li...@toolazydogs.com>wrote:
> 
>> 
>> On May 17, 2012, at 2:41 PM, Ben Bangert wrote:
>> 
>>> If someone comes up with a pure Python client to Zookeeper, I'd be happy
>>> to work on supporting that as well but its a bit beyond the level of
>>> direct involvement I can provide.
>> 
>> I'm still goofing around with a pure Python client.  I've modified the
>> Jute compiler to also generate the requests and response objects in Python:
>> 
>> http://pastie.org/3928662
>> 
>> It seems to be communicating w/ the Zookeeper instances at work perfectly
>> fine but we don't use SASL.
>> 
>> I'm a novice Python programmer and this is a simple exercise for me to cut
>> my teeth on.
>> 
>> 
>> Regards,
>> Alan
>> 
>> 


Re: The State of Python Zookeeper libraries and collaboration

Posted by Mark Gius <mg...@gmail.com>.
Are your patches to Jute available somewhere?

Mark

On Thu, May 17, 2012 at 8:24 PM, Alan D. Cabrera <li...@toolazydogs.com>wrote:

>
> On May 17, 2012, at 2:41 PM, Ben Bangert wrote:
>
> > If someone comes up with a pure Python client to Zookeeper, I'd be happy
> > to work on supporting that as well but its a bit beyond the level of
> > direct involvement I can provide.
>
> I'm still goofing around with a pure Python client.  I've modified the
> Jute compiler to also generate the requests and response objects in Python:
>
> http://pastie.org/3928662
>
> It seems to be communicating w/ the Zookeeper instances at work perfectly
> fine but we don't use SASL.
>
> I'm a novice Python programmer and this is a simple exercise for me to cut
> my teeth on.
>
>
> Regards,
> Alan
>
>

Re: The State of Python Zookeeper libraries and collaboration

Posted by "Alan D. Cabrera" <li...@toolazydogs.com>.
On May 17, 2012, at 2:41 PM, Ben Bangert wrote:

> If someone comes up with a pure Python client to Zookeeper, I'd be happy
> to work on supporting that as well but its a bit beyond the level of
> direct involvement I can provide.

I'm still goofing around with a pure Python client.  I've modified the Jute compiler to also generate the requests and response objects in Python:

http://pastie.org/3928662

It seems to be communicating w/ the Zookeeper instances at work perfectly fine but we don't use SASL.

I'm a novice Python programmer and this is a simple exercise for me to cut my teeth on. 


Regards,
Alan

 

Re: The State of Python Zookeeper libraries and collaboration

Posted by Ben Bangert <be...@groovie.org>.
On 5/17/12 9:24 AM, Mark Gius wrote:
> Are you planning on writing a pure-python client (does not call out to the
> C bindings via zkpython) or are you planning on writing a solid wrapper
> around the C bindings. Implementing a pure-python client would go a long
> way towards making various green thread frameworks work without having to
> jump through hoops.  I think we'd have to add support to Jute so that it
> would generate python data classes kind of like it does now with Java and C.
> 
> Assuming you go with a wrapper around the C bindings, I would suggest you
> take a look at something called "xthread.py", which was a thread
> synchronization primitives library that a guy proposed to the eventlet
> project a while back and provides Lock, Notify, etc etc which are safe to
> use and notify between real threads and green threads.  It gives a safe way
> to deal with sending data and doing proper locks without having to worry
> about calling out to "green" things from within non greened contexts (such
> as the zk callback functions).  It's eventlet specific, but the concepts
> and probably fair amount of the code can probably be adapted and use.
> 
> Or pure-python.  That works too. :D

I looked at writing a base zookeeper replacement that includes the
higher level API's and utilizes ctypes to talk to an install zookeeper C
binding rather than using the Python C binding. This has the advantage
of working in PyPy, however it was quite a pain and probably has some
slower performance than the Python C binding. This would be pure Python,
but not quite in the way you're referring to as its not talking directly
to Zookeeper using pure Python, but still using the C API.

I'm mainly looking at having a higher level API that makes it easier to
use Zookeeper in a less error-prone manner. Like Netflix's Curator, only
with a Pythonic API since their API is fairly heavily grounded in Java
limitations. So it'll have convenience methods, a consistent API that's
usable under greenlets or threaded code, all the recipes, and well
tested and documented.

Lots of things using Zookeeper make the notion (for right or wrong) that
watch events are executed sequentially (the C API does this for example
as does the Java one AFAIK).

To handle this my plan when using greenlets was to immediately spawn a
greenlet watch processor during the ZK client initialization that would
work off a normal non-gevent-patched Queue object, and the callbacks
will drop a lambda onto the queue from the ZK thread. This ensures even
in an async environment that by default all watch events are processed
in the same order the ZK client receives them (a watch func could of
course spawn a greenlet for itself, but at that point its already safely
in the 'green context').

The Kazoo author (David LaBissoniere) has written a small test script
that verifies this approach appears to work. I'll want to test it under
heavy load of course but it seems like a rather safe and sane approach.
It also avoids a lot of the hairier pipe code that tries and shuttles
things safely back and forth.

If someone comes up with a pure Python client to Zookeeper, I'd be happy
to work on supporting that as well but its a bit beyond the level of
direct involvement I can provide.

-- 
Ben Bangert
(ben@ || http://) groovie.org


Re: The State of Python Zookeeper libraries and collaboration

Posted by Mark Gius <mg...@gmail.com>.
On Mon, May 21, 2012 at 12:58 PM, Ben Bangert <be...@groovie.org> wrote:

> On 5/20/12 12:29 PM, Martin Kou wrote:
> > A pure Python implementation shouldn't be needed for gevent. It should be
> > sufficient to run the Zookeeper client in async mode in a background
> thread
> > (with mt or st.. personally I prefer st), and then use an eventfd() or a
> > pipe() to notify gevent.
> >
> > In fact, if you look into gevent's thread pool library - they're using
> > libev's async signal which is also based on eventfd() and pipe(). So it's
> > not a new trick.
>
> The current kazoo implementation uses something like this to pass off
> the events. Also, under a test script, it *seems* to work if the ZK
> thread plants a lambda onto a gevent Queue for a gevent greenlet worker
> to then execute.
>
> Here's an example script testing it (when the use_gevent is set to
> False, the other greenlet fails to print as its using a blocking Queue
> of course):
> http://paste.ofcode.org/37qxZAYzqYe43zbuFa3DqNf
>
> I haven't tried this out with the Python ZK lib, do you know offhand if
> this is going to run into any issue with gevent having a problem with
> items being added to its queue from the other thread?
>
> I'd like to avoid the extra complexity of the pipe's and such if this is
> going to work (which the test script seems to). I was also going with
> this approach to ensure sequential watch execution when used with gevent
> (and the watch func could spawn itself as a new greenlet if it needs to
> yield, etc.)
>
> --
> Ben Bangert
> (ben@ || http://) groovie.org
>
>
+1 on avoiding complexity, and wrangling python_zk so that it behaves with
eventlet/gevent is very complex.

The advantage of a pure-python client is that we don't have to play these
tricks to deal with the fact that the multithreaded library produces
Threads which are not Green safe, causing us to play games to ship code
back into a green thread.

The other trick is that there are more than one "green thread" libraries
out there.  gevent, eventlet, twisted, etc.  A pure python implementation
means that we don't have to special case for various libraries.  eventlet
just has to monkey patch the standard library, same with whatever gevent or
twisted do.

I'm happy to help with a client that interfaces with the C bindings, but I
think that a pure-python client would be cleaner and support a wider
variety of deployments.

Mark

Re: The State of Python Zookeeper libraries and collaboration

Posted by Ben Bangert <be...@groovie.org>.
On 5/20/12 12:29 PM, Martin Kou wrote:
> A pure Python implementation shouldn't be needed for gevent. It should be
> sufficient to run the Zookeeper client in async mode in a background thread
> (with mt or st.. personally I prefer st), and then use an eventfd() or a
> pipe() to notify gevent.
> 
> In fact, if you look into gevent's thread pool library - they're using
> libev's async signal which is also based on eventfd() and pipe(). So it's
> not a new trick.

The current kazoo implementation uses something like this to pass off
the events. Also, under a test script, it *seems* to work if the ZK
thread plants a lambda onto a gevent Queue for a gevent greenlet worker
to then execute.

Here's an example script testing it (when the use_gevent is set to
False, the other greenlet fails to print as its using a blocking Queue
of course):
http://paste.ofcode.org/37qxZAYzqYe43zbuFa3DqNf

I haven't tried this out with the Python ZK lib, do you know offhand if
this is going to run into any issue with gevent having a problem with
items being added to its queue from the other thread?

I'd like to avoid the extra complexity of the pipe's and such if this is
going to work (which the test script seems to). I was also going with
this approach to ensure sequential watch execution when used with gevent
(and the watch func could spawn itself as a new greenlet if it needs to
yield, etc.)

-- 
Ben Bangert
(ben@ || http://) groovie.org


Re: The State of Python Zookeeper libraries and collaboration

Posted by Martin Kou <bi...@gmail.com>.
A pure Python implementation shouldn't be needed for gevent. It should be
sufficient to run the Zookeeper client in async mode in a background thread
(with mt or st.. personally I prefer st), and then use an eventfd() or a
pipe() to notify gevent.

In fact, if you look into gevent's thread pool library - they're using
libev's async signal which is also based on eventfd() and pipe(). So it's
not a new trick.

Best Regards,
Martin Kou

On Thu, May 17, 2012 at 9:24 AM, Mark Gius <mg...@gmail.com> wrote:

> Are you planning on writing a pure-python client (does not call out to the
> C bindings via zkpython) or are you planning on writing a solid wrapper
> around the C bindings. Implementing a pure-python client would go a long
> way towards making various green thread frameworks work without having to
> jump through hoops.  I think we'd have to add support to Jute so that it
> would generate python data classes kind of like it does now with Java and
> C.
>
> Assuming you go with a wrapper around the C bindings, I would suggest you
> take a look at something called "xthread.py", which was a thread
> synchronization primitives library that a guy proposed to the eventlet
> project a while back and provides Lock, Notify, etc etc which are safe to
> use and notify between real threads and green threads.  It gives a safe way
> to deal with sending data and doing proper locks without having to worry
> about calling out to "green" things from within non greened contexts (such
> as the zk callback functions).  It's eventlet specific, but the concepts
> and probably fair amount of the code can probably be adapted and use.
>
> Or pure-python.  That works too. :D
>
> Mark
>
> On Wed, May 16, 2012 at 5:50 PM, Ben Bangert <be...@groovie.org> wrote:
>
> > It would seem that about 6 months or ago or so, there wasn't much out
> > there in terms of higher level Python libs for Zookeeper. There was the
> > Cloudera article on queues, and txzookeeper (which I'm sure many of us
> > not using twisted immediately ignored).
> >
> > In the time since, several people including myself needed solutions
> > involving Zookeeper with Python and seeing nothing out there all
> > apparently began writing libraries (judging from the project timelines
> > in most cases). I've been collaborating with the author of zc.zk (Jim
> > Fulton) for awhile and we decided it'd make more sense to merge our
> > efforts. In this spirit I began contacting all the other developers to
> > gauge their interest and most have been interested.
> >
> > I created a python-zk organization on GitHub to be the home for this
> > effort and moved over the zc.zk library (which people apparently had a
> > hard time locating), along with the fairly widely used staticly compiled
> > Python Zookeeper binding.
> >
> > https://github.com/python-zk
> >
> > Next up is to create the new merged core which I plan on basing mostly
> > around the cleanest implementation I have seen so far (which also
> > happens to be one of the only gevent compatible ones), kazoo. I've
> > talked with the primary author of Kazoo, and the name may remain with
> > the new merged package or it may get a new name if that doesn't work.
> > I'm not terribly tied to names as much as I am to solid, well tested,
> > well documented working code... but having catchy names does seem to
> help.
> >
> > I'm currently working on this full-time, so I expect it to be in a
> > usable state in a week or so (hopefully not too optimistic). If you're
> > interested in helping out, the more the better, please feel free to
> > e-mail me directly or respond here.
> >
> > This stuff is complex, it needs many eyes on it and lots of code review.
> >
> > This hopefully explains why I'm so interested in having a single Python
> > Zookeeper library along similar caliber to Netflix's Curator that has:
> > - Very thorough unit/integration tests (100% coverage minimum)
> > - Cleanly handles connection loss
> > - Works under gevent or threaded/blocking
> > - Very well documented (API docs and narrative)
> > - Implements all the Zookeeper recipes
> > - Service Discovery/Management
> > - Higher level utility functions for common Zookeeper tasks
> >
> > In the mean-time, here is a summary of my research efforts and code
> > review (if something isn't accurate, please feel free to correct).
> >
> > Please don't take this as a critique, I'm just trying to document what
> > is out there for my own reference on merging and hopefully so other
> > people coming along don't continue to replicate this. :)
> >
> >
> > gevent-zookeeper
> >    - https://github.com/jrydberg/gevent-zookeeper/
> >
> >    - Works under gevent
> >    - No tests
> >    - No documentation
> >
> > kazoo
> >    - https://github.com/nimbusproject/kazoo
> >
> >    - Resilient Client
> >    - Basic Lock (Uses UUID properly)
> >    - Some Tests (Integrated)
> >    - No documentation (doc strings only)
> >    - Works under gevent
> >
> > pykeeper
> >    - https://github.com/nkvoll/pykeeper
> >
> >    - Higher level client (not resiliant to errors)
> >    - Documentation
> >    - Some tests (Integrated)
> >
> > txzookeeper
> >    - JuJu Team
> >    - https://launchpad.net/txzookeeper
> >
> >    - Resilient Client
> >    - Doesn't handle create node edge-case
> >    - Basic Lock (open bug filed to handle the UUID bit)
> >    - Queue, ReliableQueue, SerializedQueue
> >    - No documentation (doc strings only)
> >    - Usable only from twisted
> >    - Well tested (Integrated)
> >
> > twitter zookeeper lib
> >    -
> >
> >
> https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper
> >
> >    - Resilient Client
> >    - Handles create node edge-case
> >    - Service Registration/Discovery
> >    - Some documentation
> >    - Well tested (Integrated)
> >    - Tied to a lot of twitter commons code
> >
> > zkpython (improvements to a fork of the official bindings)
> >    - https://github.com/duncf/zkpython/
> >
> >    - Resilient Client
> >    - Basic Lock (Using unique id rather than UUID)
> >    - Handles create node edge-case
> >    - Some Tests (Integrated)
> >    - No additional docs
> >
> > zc.zk
> >    - https://github.com/python-zk/zc.zk
> >
> >    - Non-resilient Client (reconnects must be handled)
> >    - Higher level automatic watch functionality
> >    - Service Registration/Discovery
> >    - Well tested (Unit and Integration tests)
> >    - Documented (on usage, source code is missing doc strings)
> >
> > zktools
> >    - https://github.com/mozilla-services/zktools
> >
> >    - Relies on zc.zk
> >    - Shared Read/Write Locks
> >    - AsyncLock
> >    - Revokable Locks
> >    - Tests (Integrated)
> >
> > zoop
> >    - https://github.com/davidmiller/zoop
> >
> >    - Doesn't handle create node edge-case
> >    - Doesn't handle retryable exceptions
> >    - Revokable Lock (Doesn't handle create node edge-case, uses a
> permanent
> >                      node instead of ephemeral)
> >    - Tested (Unit tests via ZK mocks)
> >    - Well Documented (doc strings and narrative docs)
> >
> >
> > --
> > Ben Bangert
> > (ben@ || http://) groovie.org
> >
> >
>

Re: The State of Python Zookeeper libraries and collaboration

Posted by Mark Gius <mg...@gmail.com>.
Are you planning on writing a pure-python client (does not call out to the
C bindings via zkpython) or are you planning on writing a solid wrapper
around the C bindings. Implementing a pure-python client would go a long
way towards making various green thread frameworks work without having to
jump through hoops.  I think we'd have to add support to Jute so that it
would generate python data classes kind of like it does now with Java and C.

Assuming you go with a wrapper around the C bindings, I would suggest you
take a look at something called "xthread.py", which was a thread
synchronization primitives library that a guy proposed to the eventlet
project a while back and provides Lock, Notify, etc etc which are safe to
use and notify between real threads and green threads.  It gives a safe way
to deal with sending data and doing proper locks without having to worry
about calling out to "green" things from within non greened contexts (such
as the zk callback functions).  It's eventlet specific, but the concepts
and probably fair amount of the code can probably be adapted and use.

Or pure-python.  That works too. :D

Mark

On Wed, May 16, 2012 at 5:50 PM, Ben Bangert <be...@groovie.org> wrote:

> It would seem that about 6 months or ago or so, there wasn't much out
> there in terms of higher level Python libs for Zookeeper. There was the
> Cloudera article on queues, and txzookeeper (which I'm sure many of us
> not using twisted immediately ignored).
>
> In the time since, several people including myself needed solutions
> involving Zookeeper with Python and seeing nothing out there all
> apparently began writing libraries (judging from the project timelines
> in most cases). I've been collaborating with the author of zc.zk (Jim
> Fulton) for awhile and we decided it'd make more sense to merge our
> efforts. In this spirit I began contacting all the other developers to
> gauge their interest and most have been interested.
>
> I created a python-zk organization on GitHub to be the home for this
> effort and moved over the zc.zk library (which people apparently had a
> hard time locating), along with the fairly widely used staticly compiled
> Python Zookeeper binding.
>
> https://github.com/python-zk
>
> Next up is to create the new merged core which I plan on basing mostly
> around the cleanest implementation I have seen so far (which also
> happens to be one of the only gevent compatible ones), kazoo. I've
> talked with the primary author of Kazoo, and the name may remain with
> the new merged package or it may get a new name if that doesn't work.
> I'm not terribly tied to names as much as I am to solid, well tested,
> well documented working code... but having catchy names does seem to help.
>
> I'm currently working on this full-time, so I expect it to be in a
> usable state in a week or so (hopefully not too optimistic). If you're
> interested in helping out, the more the better, please feel free to
> e-mail me directly or respond here.
>
> This stuff is complex, it needs many eyes on it and lots of code review.
>
> This hopefully explains why I'm so interested in having a single Python
> Zookeeper library along similar caliber to Netflix's Curator that has:
> - Very thorough unit/integration tests (100% coverage minimum)
> - Cleanly handles connection loss
> - Works under gevent or threaded/blocking
> - Very well documented (API docs and narrative)
> - Implements all the Zookeeper recipes
> - Service Discovery/Management
> - Higher level utility functions for common Zookeeper tasks
>
> In the mean-time, here is a summary of my research efforts and code
> review (if something isn't accurate, please feel free to correct).
>
> Please don't take this as a critique, I'm just trying to document what
> is out there for my own reference on merging and hopefully so other
> people coming along don't continue to replicate this. :)
>
>
> gevent-zookeeper
>    - https://github.com/jrydberg/gevent-zookeeper/
>
>    - Works under gevent
>    - No tests
>    - No documentation
>
> kazoo
>    - https://github.com/nimbusproject/kazoo
>
>    - Resilient Client
>    - Basic Lock (Uses UUID properly)
>    - Some Tests (Integrated)
>    - No documentation (doc strings only)
>    - Works under gevent
>
> pykeeper
>    - https://github.com/nkvoll/pykeeper
>
>    - Higher level client (not resiliant to errors)
>    - Documentation
>    - Some tests (Integrated)
>
> txzookeeper
>    - JuJu Team
>    - https://launchpad.net/txzookeeper
>
>    - Resilient Client
>    - Doesn't handle create node edge-case
>    - Basic Lock (open bug filed to handle the UUID bit)
>    - Queue, ReliableQueue, SerializedQueue
>    - No documentation (doc strings only)
>    - Usable only from twisted
>    - Well tested (Integrated)
>
> twitter zookeeper lib
>    -
>
> https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper
>
>    - Resilient Client
>    - Handles create node edge-case
>    - Service Registration/Discovery
>    - Some documentation
>    - Well tested (Integrated)
>    - Tied to a lot of twitter commons code
>
> zkpython (improvements to a fork of the official bindings)
>    - https://github.com/duncf/zkpython/
>
>    - Resilient Client
>    - Basic Lock (Using unique id rather than UUID)
>    - Handles create node edge-case
>    - Some Tests (Integrated)
>    - No additional docs
>
> zc.zk
>    - https://github.com/python-zk/zc.zk
>
>    - Non-resilient Client (reconnects must be handled)
>    - Higher level automatic watch functionality
>    - Service Registration/Discovery
>    - Well tested (Unit and Integration tests)
>    - Documented (on usage, source code is missing doc strings)
>
> zktools
>    - https://github.com/mozilla-services/zktools
>
>    - Relies on zc.zk
>    - Shared Read/Write Locks
>    - AsyncLock
>    - Revokable Locks
>    - Tests (Integrated)
>
> zoop
>    - https://github.com/davidmiller/zoop
>
>    - Doesn't handle create node edge-case
>    - Doesn't handle retryable exceptions
>    - Revokable Lock (Doesn't handle create node edge-case, uses a permanent
>                      node instead of ephemeral)
>    - Tested (Unit tests via ZK mocks)
>    - Well Documented (doc strings and narrative docs)
>
>
> --
> Ben Bangert
> (ben@ || http://) groovie.org
>
>