You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Dominic Williams <th...@googlemail.com> on 2010/03/22 13:05:47 UTC

A deficiency? You can only add children to persistent nodes

Hi all,

I'm developing a library of primitives backed by ZooKeeper. An issue I've
recently hit is that you can only add children to persistent nodes.

When you're creating primitives for things like locking - for use for exampe
synchronizing cluster-wide processing of a NOSQL database like Cassandra -
this is a real pain:

1/ If a node crashes or something else goes wrong, you leave behind
persistent nodes. Over time these will grow and grow, rather like the old
tmp folders used to fill with files under Windows

2/ Persistent nodes = nasty scalability *bottleneck* because you're actually
having to write to disk somewhere.

To avoid this I'm actually thinking of writing locking system where you work
out the existing chain not by enumerating sequential children, but by
looking at the contents of each temporary lock node to see what it is
waiting on. But... that's quite horrible. Was wondering whether there is
some technical reason why you ephemeral nodes can't have children??

Otherwise it is aceptable e.g.

ZkPath ensurePath = new ZkPath("/Starburst/cluster/nodes",
CreateMode.PERSISTENT);
ensurePath.waitSynchronized();
clusterMembers = new ZkDistributedSet("/Starburst/cluster/nodes", new
String[] { thisNodeEndPoint }, true);
clusterMembers.addChangeListener(clusterNodesChangedProcessor, true);
clusterMembers.waitSynchronized();

Best, Dominic
ria101.wordpress.com

Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
Hi Lei, that's perfectly fine, however the issue would be that 
operations such as "getchildren(/path)" would return all these, vs if 
you had an explicit path structure such as 
getchildren(/path/starburst/cluster/nodes) - ie flat vs hierarchical.

Patrick

Lei Zhang wrote:
> Hi Dominic,
> 
> Is it acceptable to use ephemeral nodes with hierarchical names, such as
> "starburst.cluster.nodes"?
> 

Re: A deficiency? You can only add children to persistent nodes

Posted by Lei Zhang <lz...@gmail.com>.
Hi Dominic,

Is it acceptable to use ephemeral nodes with hierarchical names, such as
"starburst.cluster.nodes"?

Re: A deficiency? You can only add children to persistent nodes

Posted by Gustavo Niemeyer <gu...@niemeyer.net>.
Hi Ben,

This would certainly be very interesting!  FWIW, I missed something like
that before as well.

On 23 Mar 2010 02:44, "Benjamin Reed" <br...@yahoo-inc.com> wrote:

let me put out an idea that we have kicked around for a while: ephemeral
containers. the idea is that the znode disappears if it doesn't have
children. you would create the znode with create("/path", data, acl,
EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
/path/child. (we have to create it with a child otherwise it immediately
disappears.)

i think this mechanism would address your need in a way that is easy to
implement and use. it would also allow you to do a cool barrier
implementation!

ben



On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>
> Dominic Williams wrote:
>
>>
>> What I'd sugges...

Re: A deficiency? You can only add children to persistent nodes

Posted by Dominic Williams <th...@googlemail.com>.
Just for "correct" programming approach given the latest libraries. For
example with 1.6 you can create a weak set like this

set = Collections.newSetFromMap(new WeakHashMap<ZkSyncPrimitive,
Boolean>());

Otherwise you need to bundle a WeakSet implementation. I'm guessing the less
dependencies the better...

On 24 March 2010 16:12, Patrick Hunt <ph...@apache.org> wrote:

> A while back (3.1? basically long after the mac port of the 1.6 jvm was
> considered "stable") we stopped "officially" supporting 1.5
> http://bit.ly/9IlzUq
>
> However there have been occasional requests from ppl trying to use 1.5 and
> we do our best not to break things for them. Last I checked the code still
> compiled under 1.5 (but it's been a while). IMO it's best to use 1.6 though,
> even if it compiles/runs for 1.5 there have been a number of significant
> bugs fixed in the jvm since then.
>
> Dominic, is there some reason in particular in this case?
>
> Patrick
>
>
> Dominic Williams wrote:
>
>> Hi Patrick, re: Weak Watchers implementation. Is it ok to assume JDK 6?
>>
>> On 23 March 2010 17:32, Patrick Hunt <ph...@apache.org> wrote:
>>
>>  Feel free to assign yourself to WW. I encourage you to document (comments
>>> in jira or wiki page) some rough approximation of the api/approach for
>>> WW,
>>> it's better to get the comments/concerns up front than after you took the
>>> time to work it all out in a patch. There are good tests for watchers
>>> already, you'll need to extend those for WW and add appropriate
>>> javadoc/forrest docs. Asking lots of questions is fine. :-)
>>>
>>> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>>>
>>> Patrick
>>>
>>>
>>> Dominic Williams wrote:
>>>
>>>  Ok will do. Will create patch for weak watchers and corral existing
>>>> discussion into ephem in ephem.
>>>>
>>>> This will happen some time this week
>>>>
>>>> Thanks for all the feedback.
>>>>
>>>> On 23 March 2010 16:29, Patrick Hunt <ph...@apache.org> wrote:
>>>>
>>>>  Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem
>>>> in
>>>>
>>>>> ephem and a second for the weak watchers. Summarize your goals and the
>>>>> discussion so far as appropriate. BTW, either of these JIRAs would be
>>>>> good
>>>>> for a new contributor interested in gaining experience with ZooKeeper.
>>>>> ephem
>>>>> is a bit tougher, but not so extensive that it's insurmountable (of
>>>>> course
>>>>> our current dev base will help out).
>>>>>
>>>>> Regards,
>>>>>
>>>>> Patrick
>>>>>
>>>>>
>>>>> Jeff Hammerbacher wrote:
>>>>>
>>>>>  Hey Ben,
>>>>>
>>>>>> Perhaps you should open a JIRA for further discussion?
>>>>>>
>>>>>> Thanks,
>>>>>> Jeff
>>>>>>
>>>>>> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com>
>>>>>> wrote:
>>>>>>
>>>>>>  let me put out an idea that we have kicked around for a while:
>>>>>> ephemeral
>>>>>>
>>>>>>  containers. the idea is that the znode disappears if it doesn't have
>>>>>>> children. you would create the znode with create("/path", data, acl,
>>>>>>> EPHEMERAL_CONTAINER) this would result in the creation two znodes:
>>>>>>> /path
>>>>>>> and
>>>>>>> /path/child. (we have to create it with a child otherwise it
>>>>>>> immediately
>>>>>>> disappears.)
>>>>>>>
>>>>>>> i think this mechanism would address your need in a way that is easy
>>>>>>> to
>>>>>>> implement and use. it would also allow you to do a cool barrier
>>>>>>> implementation!
>>>>>>>
>>>>>>> ben
>>>>>>>
>>>>>>>
>>>>>>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>>>>>>
>>>>>>>  Dominic Williams wrote:
>>>>>>>
>>>>>>>   What I'd suggest might work:
>>>>>>>>
>>>>>>>>  - when the session that created the parent ends, ownership of the
>>>>>>>>> parent
>>>>>>>>> could either be transferred to the owner/session that created the
>>>>>>>>> oldest
>>>>>>>>> child, or instead ownership could be transferred to some kind of
>>>>>>>>> nominal
>>>>>>>>> system session (which would delete the parent once the last
>>>>>>>>> ephemeral
>>>>>>>>> child
>>>>>>>>> disappeared)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  There may be some issues with idempotency here, also it could
>>>>>>>>> require
>>>>>>>>>
>>>>>>>>>  extensive locking which drives up operation latencies (essentially
>>>>>>>> "recursive delete"). It sounds possible, but someone would have to
>>>>>>>> take
>>>>>>>> a closer look as to the technical challenges involved.
>>>>>>>>
>>>>>>>>
>>>>>>>> Our general philosophy is to keep things as simple as possible wrt
>>>>>>>> api,
>>>>>>>> semantics, implementation, etc... Distributed communication is hard
>>>>>>>> and
>>>>>>>> while we handle a lot of the issues for you it's still complex.
>>>>>>>> Following our philosophy generally makes the easy things simple and
>>>>>>>> the
>>>>>>>> hard things possible, additionally it reduces the number of bugs
>>>>>>>> that
>>>>>>>> we
>>>>>>>> have in the implementation itself (both user and service code).
>>>>>>>>
>>>>>>>> I don't wish to discourage you as much as provide insight/background
>>>>>>>> into some of our decisions.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>  Dominic Williams wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>  1/ If a node crashes or something else goes wrong, you leave
>>>>>>>>>> behind
>>>>>>>>>>
>>>>>>>>>>  persistent nodes. Over time these will grow and grow, rather like
>>>>>>>>>>> the
>>>>>>>>>>> old
>>>>>>>>>>> tmp folders used to fill with files under Windows
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  That's true. One either needs to use ephemerals or use
>>>>>>>>>>> persistent
>>>>>>>>>>>
>>>>>>>>>>>  and
>>>>>>>>>> have
>>>>>>>>>> a "garbage collector" (implicit or explicit gc). In most cases
>>>>>>>>>> it's
>>>>>>>>>> preferable to use the ephemeral.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because
>>>>>>>>>> you're
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  actually
>>>>>>>>>>
>>>>>>>>>>  having to write to disk somewhere.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  This is not actually how ZK works. All znodes regardless of
>>>>>>>>>>>
>>>>>>>>>>>  persistent/ephemeral are written to disk persistently. Ephemeral
>>>>>>>>>> nodes
>>>>>>>>>> are
>>>>>>>>>> tied to the session that created them. As long as the session is
>>>>>>>>>> alive
>>>>>>>>>> the
>>>>>>>>>> ephemeral node is alive. Sessions themselves are
>>>>>>>>>> persistently/reliably
>>>>>>>>>> stored by the ZK cluster. This allows the shutdown of the entire
>>>>>>>>>> cluster
>>>>>>>>>> and
>>>>>>>>>> restart it, all sessions/ephemerals will be maintained. Sessions
>>>>>>>>>> can
>>>>>>>>>> move
>>>>>>>>>> from server to server (if say network connectivity to server A
>>>>>>>>>> fails,
>>>>>>>>>> or
>>>>>>>>>> server A itself fails then the client will move to server B). The
>>>>>>>>>> session
>>>>>>>>>> and all ephemerals are maintained (well, as long as the client
>>>>>>>>>> moves
>>>>>>>>>> withing
>>>>>>>>>> the expiration timeout value).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  To avoid this I'm actually thinking of writing locking system
>>>>>>>>>> where
>>>>>>>>>> you
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  work
>>>>>>>>>>
>>>>>>>>>>  out the existing chain not by enumerating sequential children,
>>>>>>>>>>> but
>>>>>>>>>>> by
>>>>>>>>>>> looking at the contents of each temporary lock node to see what
>>>>>>>>>>> it
>>>>>>>>>>> is
>>>>>>>>>>> waiting on. But... that's quite horrible. Was wondering whether
>>>>>>>>>>> there
>>>>>>>>>>> is
>>>>>>>>>>> some technical reason why you ephemeral nodes can't have
>>>>>>>>>>> children??
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  There are a few cases to think about.
>>>>>>>>>>>
>>>>>>>>>>>  1) obviously ephemeral nodes can't have persistent children,
>>>>>>>>>> this
>>>>>>>>>> just
>>>>>>>>>> doesn't make sense
>>>>>>>>>>
>>>>>>>>>> 2) ephemeral nodes have an owner - the session that created them.
>>>>>>>>>> so
>>>>>>>>>> it
>>>>>>>>>> would also not make sense (in my mind at least) to have an
>>>>>>>>>> ephemeral
>>>>>>>>>> /foo
>>>>>>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>>>>>>
>>>>>>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral
>>>>>>>>>> with
>>>>>>>>>> the
>>>>>>>>>> same owner".
>>>>>>>>>>
>>>>>>>>>> 4) there are also issues of order. in particular what is the
>>>>>>>>>> "deletion
>>>>>>>>>> order" depth first or breadth first, etc...
>>>>>>>>>>
>>>>>>>>>> I believe the answer so far has been "we don't do this because
>>>>>>>>>> it's
>>>>>>>>>> fairly
>>>>>>>>>> complicated and we haven't seen any use cases that require it." In
>>>>>>>>>> the
>>>>>>>>>> cases
>>>>>>>>>> I've seen so far there was either a misunderstanding of how zk
>>>>>>>>>> worked,
>>>>>>>>>> or a
>>>>>>>>>> simpler way available.
>>>>>>>>>>
>>>>>>>>>> Does that make sense? Thoughts?
>>>>>>>>>>
>>>>>>>>>> Patrick
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>

Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
A while back (3.1? basically long after the mac port of the 1.6 jvm was 
considered "stable") we stopped "officially" supporting 1.5 
http://bit.ly/9IlzUq

However there have been occasional requests from ppl trying to use 1.5 
and we do our best not to break things for them. Last I checked the code 
still compiled under 1.5 (but it's been a while). IMO it's best to use 
1.6 though, even if it compiles/runs for 1.5 there have been a number of 
significant bugs fixed in the jvm since then.

Dominic, is there some reason in particular in this case?

Patrick

Dominic Williams wrote:
> Hi Patrick, re: Weak Watchers implementation. Is it ok to assume JDK 6?
> 
> On 23 March 2010 17:32, Patrick Hunt <ph...@apache.org> wrote:
> 
>> Feel free to assign yourself to WW. I encourage you to document (comments
>> in jira or wiki page) some rough approximation of the api/approach for WW,
>> it's better to get the comments/concerns up front than after you took the
>> time to work it all out in a patch. There are good tests for watchers
>> already, you'll need to extend those for WW and add appropriate
>> javadoc/forrest docs. Asking lots of questions is fine. :-)
>>
>> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>>
>> Patrick
>>
>>
>> Dominic Williams wrote:
>>
>>> Ok will do. Will create patch for weak watchers and corral existing
>>> discussion into ephem in ephem.
>>>
>>> This will happen some time this week
>>>
>>> Thanks for all the feedback.
>>>
>>> On 23 March 2010 16:29, Patrick Hunt <ph...@apache.org> wrote:
>>>
>>>  Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem in
>>>> ephem and a second for the weak watchers. Summarize your goals and the
>>>> discussion so far as appropriate. BTW, either of these JIRAs would be
>>>> good
>>>> for a new contributor interested in gaining experience with ZooKeeper.
>>>> ephem
>>>> is a bit tougher, but not so extensive that it's insurmountable (of
>>>> course
>>>> our current dev base will help out).
>>>>
>>>> Regards,
>>>>
>>>> Patrick
>>>>
>>>>
>>>> Jeff Hammerbacher wrote:
>>>>
>>>>  Hey Ben,
>>>>> Perhaps you should open a JIRA for further discussion?
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com>
>>>>> wrote:
>>>>>
>>>>>  let me put out an idea that we have kicked around for a while:
>>>>> ephemeral
>>>>>
>>>>>> containers. the idea is that the znode disappears if it doesn't have
>>>>>> children. you would create the znode with create("/path", data, acl,
>>>>>> EPHEMERAL_CONTAINER) this would result in the creation two znodes:
>>>>>> /path
>>>>>> and
>>>>>> /path/child. (we have to create it with a child otherwise it
>>>>>> immediately
>>>>>> disappears.)
>>>>>>
>>>>>> i think this mechanism would address your need in a way that is easy to
>>>>>> implement and use. it would also allow you to do a cool barrier
>>>>>> implementation!
>>>>>>
>>>>>> ben
>>>>>>
>>>>>>
>>>>>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>>>>>
>>>>>>  Dominic Williams wrote:
>>>>>>
>>>>>>>  What I'd suggest might work:
>>>>>>>
>>>>>>>> - when the session that created the parent ends, ownership of the
>>>>>>>> parent
>>>>>>>> could either be transferred to the owner/session that created the
>>>>>>>> oldest
>>>>>>>> child, or instead ownership could be transferred to some kind of
>>>>>>>> nominal
>>>>>>>> system session (which would delete the parent once the last ephemeral
>>>>>>>> child
>>>>>>>> disappeared)
>>>>>>>>
>>>>>>>>
>>>>>>>>  There may be some issues with idempotency here, also it could
>>>>>>>> require
>>>>>>>>
>>>>>>> extensive locking which drives up operation latencies (essentially
>>>>>>> "recursive delete"). It sounds possible, but someone would have to
>>>>>>> take
>>>>>>> a closer look as to the technical challenges involved.
>>>>>>>
>>>>>>>
>>>>>>> Our general philosophy is to keep things as simple as possible wrt
>>>>>>> api,
>>>>>>> semantics, implementation, etc... Distributed communication is hard
>>>>>>> and
>>>>>>> while we handle a lot of the issues for you it's still complex.
>>>>>>> Following our philosophy generally makes the easy things simple and
>>>>>>> the
>>>>>>> hard things possible, additionally it reduces the number of bugs that
>>>>>>> we
>>>>>>> have in the implementation itself (both user and service code).
>>>>>>>
>>>>>>> I don't wish to discourage you as much as provide insight/background
>>>>>>> into some of our decisions.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>  Dominic Williams wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  1/ If a node crashes or something else goes wrong, you leave behind
>>>>>>>>>
>>>>>>>>>> persistent nodes. Over time these will grow and grow, rather like
>>>>>>>>>> the
>>>>>>>>>> old
>>>>>>>>>> tmp folders used to fill with files under Windows
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  That's true. One either needs to use ephemerals or use persistent
>>>>>>>>>>
>>>>>>>>> and
>>>>>>>>> have
>>>>>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>>>>>> preferable to use the ephemeral.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  actually
>>>>>>>>>
>>>>>>>>>> having to write to disk somewhere.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  This is not actually how ZK works. All znodes regardless of
>>>>>>>>>>
>>>>>>>>> persistent/ephemeral are written to disk persistently. Ephemeral
>>>>>>>>> nodes
>>>>>>>>> are
>>>>>>>>> tied to the session that created them. As long as the session is
>>>>>>>>> alive
>>>>>>>>> the
>>>>>>>>> ephemeral node is alive. Sessions themselves are
>>>>>>>>> persistently/reliably
>>>>>>>>> stored by the ZK cluster. This allows the shutdown of the entire
>>>>>>>>> cluster
>>>>>>>>> and
>>>>>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>>>>>> move
>>>>>>>>> from server to server (if say network connectivity to server A
>>>>>>>>> fails,
>>>>>>>>> or
>>>>>>>>> server A itself fails then the client will move to server B). The
>>>>>>>>> session
>>>>>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>>>>>> withing
>>>>>>>>> the expiration timeout value).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  To avoid this I'm actually thinking of writing locking system where
>>>>>>>>> you
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  work
>>>>>>>>>
>>>>>>>>>> out the existing chain not by enumerating sequential children, but
>>>>>>>>>> by
>>>>>>>>>> looking at the contents of each temporary lock node to see what it
>>>>>>>>>> is
>>>>>>>>>> waiting on. But... that's quite horrible. Was wondering whether
>>>>>>>>>> there
>>>>>>>>>> is
>>>>>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  There are a few cases to think about.
>>>>>>>>>>
>>>>>>>>> 1) obviously ephemeral nodes can't have persistent children, this
>>>>>>>>> just
>>>>>>>>> doesn't make sense
>>>>>>>>>
>>>>>>>>> 2) ephemeral nodes have an owner - the session that created them. so
>>>>>>>>> it
>>>>>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>>>>>> /foo
>>>>>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>>>>>
>>>>>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral
>>>>>>>>> with
>>>>>>>>> the
>>>>>>>>> same owner".
>>>>>>>>>
>>>>>>>>> 4) there are also issues of order. in particular what is the
>>>>>>>>> "deletion
>>>>>>>>> order" depth first or breadth first, etc...
>>>>>>>>>
>>>>>>>>> I believe the answer so far has been "we don't do this because it's
>>>>>>>>> fairly
>>>>>>>>> complicated and we haven't seen any use cases that require it." In
>>>>>>>>> the
>>>>>>>>> cases
>>>>>>>>> I've seen so far there was either a misunderstanding of how zk
>>>>>>>>> worked,
>>>>>>>>> or a
>>>>>>>>> simpler way available.
>>>>>>>>>
>>>>>>>>> Does that make sense? Thoughts?
>>>>>>>>>
>>>>>>>>> Patrick
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
> 

Re: A deficiency? You can only add children to persistent nodes

Posted by Dominic Williams <th...@googlemail.com>.
Hi Patrick, re: Weak Watchers implementation. Is it ok to assume JDK 6?

On 23 March 2010 17:32, Patrick Hunt <ph...@apache.org> wrote:

> Feel free to assign yourself to WW. I encourage you to document (comments
> in jira or wiki page) some rough approximation of the api/approach for WW,
> it's better to get the comments/concerns up front than after you took the
> time to work it all out in a patch. There are good tests for watchers
> already, you'll need to extend those for WW and add appropriate
> javadoc/forrest docs. Asking lots of questions is fine. :-)
>
> http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
>
> Patrick
>
>
> Dominic Williams wrote:
>
>> Ok will do. Will create patch for weak watchers and corral existing
>> discussion into ephem in ephem.
>>
>> This will happen some time this week
>>
>> Thanks for all the feedback.
>>
>> On 23 March 2010 16:29, Patrick Hunt <ph...@apache.org> wrote:
>>
>>  Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem in
>>> ephem and a second for the weak watchers. Summarize your goals and the
>>> discussion so far as appropriate. BTW, either of these JIRAs would be
>>> good
>>> for a new contributor interested in gaining experience with ZooKeeper.
>>> ephem
>>> is a bit tougher, but not so extensive that it's insurmountable (of
>>> course
>>> our current dev base will help out).
>>>
>>> Regards,
>>>
>>> Patrick
>>>
>>>
>>> Jeff Hammerbacher wrote:
>>>
>>>  Hey Ben,
>>>>
>>>> Perhaps you should open a JIRA for further discussion?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com>
>>>> wrote:
>>>>
>>>>  let me put out an idea that we have kicked around for a while:
>>>> ephemeral
>>>>
>>>>> containers. the idea is that the znode disappears if it doesn't have
>>>>> children. you would create the znode with create("/path", data, acl,
>>>>> EPHEMERAL_CONTAINER) this would result in the creation two znodes:
>>>>> /path
>>>>> and
>>>>> /path/child. (we have to create it with a child otherwise it
>>>>> immediately
>>>>> disappears.)
>>>>>
>>>>> i think this mechanism would address your need in a way that is easy to
>>>>> implement and use. it would also allow you to do a cool barrier
>>>>> implementation!
>>>>>
>>>>> ben
>>>>>
>>>>>
>>>>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>>>>
>>>>>  Dominic Williams wrote:
>>>>>
>>>>>>
>>>>>>  What I'd suggest might work:
>>>>>>
>>>>>>> - when the session that created the parent ends, ownership of the
>>>>>>> parent
>>>>>>> could either be transferred to the owner/session that created the
>>>>>>> oldest
>>>>>>> child, or instead ownership could be transferred to some kind of
>>>>>>> nominal
>>>>>>> system session (which would delete the parent once the last ephemeral
>>>>>>> child
>>>>>>> disappeared)
>>>>>>>
>>>>>>>
>>>>>>>  There may be some issues with idempotency here, also it could
>>>>>>> require
>>>>>>>
>>>>>> extensive locking which drives up operation latencies (essentially
>>>>>> "recursive delete"). It sounds possible, but someone would have to
>>>>>> take
>>>>>> a closer look as to the technical challenges involved.
>>>>>>
>>>>>>
>>>>>> Our general philosophy is to keep things as simple as possible wrt
>>>>>> api,
>>>>>> semantics, implementation, etc... Distributed communication is hard
>>>>>> and
>>>>>> while we handle a lot of the issues for you it's still complex.
>>>>>> Following our philosophy generally makes the easy things simple and
>>>>>> the
>>>>>> hard things possible, additionally it reduces the number of bugs that
>>>>>> we
>>>>>> have in the implementation itself (both user and service code).
>>>>>>
>>>>>> I don't wish to discourage you as much as provide insight/background
>>>>>> into some of our decisions.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Dominic Williams wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  1/ If a node crashes or something else goes wrong, you leave behind
>>>>>>>>
>>>>>>>>> persistent nodes. Over time these will grow and grow, rather like
>>>>>>>>> the
>>>>>>>>> old
>>>>>>>>> tmp folders used to fill with files under Windows
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  That's true. One either needs to use ephemerals or use persistent
>>>>>>>>>
>>>>>>>> and
>>>>>>>> have
>>>>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>>>>> preferable to use the ephemeral.
>>>>>>>>
>>>>>>>>
>>>>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>>>>
>>>>>>>>
>>>>>>>>  actually
>>>>>>>>
>>>>>>>>> having to write to disk somewhere.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  This is not actually how ZK works. All znodes regardless of
>>>>>>>>>
>>>>>>>> persistent/ephemeral are written to disk persistently. Ephemeral
>>>>>>>> nodes
>>>>>>>> are
>>>>>>>> tied to the session that created them. As long as the session is
>>>>>>>> alive
>>>>>>>> the
>>>>>>>> ephemeral node is alive. Sessions themselves are
>>>>>>>> persistently/reliably
>>>>>>>> stored by the ZK cluster. This allows the shutdown of the entire
>>>>>>>> cluster
>>>>>>>> and
>>>>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>>>>> move
>>>>>>>> from server to server (if say network connectivity to server A
>>>>>>>> fails,
>>>>>>>> or
>>>>>>>> server A itself fails then the client will move to server B). The
>>>>>>>> session
>>>>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>>>>> withing
>>>>>>>> the expiration timeout value).
>>>>>>>>
>>>>>>>>
>>>>>>>>  To avoid this I'm actually thinking of writing locking system where
>>>>>>>> you
>>>>>>>>
>>>>>>>>
>>>>>>>>  work
>>>>>>>>
>>>>>>>>> out the existing chain not by enumerating sequential children, but
>>>>>>>>> by
>>>>>>>>> looking at the contents of each temporary lock node to see what it
>>>>>>>>> is
>>>>>>>>> waiting on. But... that's quite horrible. Was wondering whether
>>>>>>>>> there
>>>>>>>>> is
>>>>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  There are a few cases to think about.
>>>>>>>>>
>>>>>>>> 1) obviously ephemeral nodes can't have persistent children, this
>>>>>>>> just
>>>>>>>> doesn't make sense
>>>>>>>>
>>>>>>>> 2) ephemeral nodes have an owner - the session that created them. so
>>>>>>>> it
>>>>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>>>>> /foo
>>>>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>>>>
>>>>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral
>>>>>>>> with
>>>>>>>> the
>>>>>>>> same owner".
>>>>>>>>
>>>>>>>> 4) there are also issues of order. in particular what is the
>>>>>>>> "deletion
>>>>>>>> order" depth first or breadth first, etc...
>>>>>>>>
>>>>>>>> I believe the answer so far has been "we don't do this because it's
>>>>>>>> fairly
>>>>>>>> complicated and we haven't seen any use cases that require it." In
>>>>>>>> the
>>>>>>>> cases
>>>>>>>> I've seen so far there was either a misunderstanding of how zk
>>>>>>>> worked,
>>>>>>>> or a
>>>>>>>> simpler way available.
>>>>>>>>
>>>>>>>> Does that make sense? Thoughts?
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>

Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
Feel free to assign yourself to WW. I encourage you to document 
(comments in jira or wiki page) some rough approximation of the 
api/approach for WW, it's better to get the comments/concerns up front 
than after you took the time to work it all out in a patch. There are 
good tests for watchers already, you'll need to extend those for WW and 
add appropriate javadoc/forrest docs. Asking lots of questions is fine. :-)

http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

Patrick

Dominic Williams wrote:
> Ok will do. Will create patch for weak watchers and corral existing
> discussion into ephem in ephem.
> 
> This will happen some time this week
> 
> Thanks for all the feedback.
> 
> On 23 March 2010 16:29, Patrick Hunt <ph...@apache.org> wrote:
> 
>> Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem in
>> ephem and a second for the weak watchers. Summarize your goals and the
>> discussion so far as appropriate. BTW, either of these JIRAs would be good
>> for a new contributor interested in gaining experience with ZooKeeper. ephem
>> is a bit tougher, but not so extensive that it's insurmountable (of course
>> our current dev base will help out).
>>
>> Regards,
>>
>> Patrick
>>
>>
>> Jeff Hammerbacher wrote:
>>
>>> Hey Ben,
>>>
>>> Perhaps you should open a JIRA for further discussion?
>>>
>>> Thanks,
>>> Jeff
>>>
>>> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com>
>>> wrote:
>>>
>>>  let me put out an idea that we have kicked around for a while: ephemeral
>>>> containers. the idea is that the znode disappears if it doesn't have
>>>> children. you would create the znode with create("/path", data, acl,
>>>> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path
>>>> and
>>>> /path/child. (we have to create it with a child otherwise it immediately
>>>> disappears.)
>>>>
>>>> i think this mechanism would address your need in a way that is easy to
>>>> implement and use. it would also allow you to do a cool barrier
>>>> implementation!
>>>>
>>>> ben
>>>>
>>>>
>>>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>>>
>>>>  Dominic Williams wrote:
>>>>>
>>>>>  What I'd suggest might work:
>>>>>> - when the session that created the parent ends, ownership of the
>>>>>> parent
>>>>>> could either be transferred to the owner/session that created the
>>>>>> oldest
>>>>>> child, or instead ownership could be transferred to some kind of
>>>>>> nominal
>>>>>> system session (which would delete the parent once the last ephemeral
>>>>>> child
>>>>>> disappeared)
>>>>>>
>>>>>>
>>>>>>  There may be some issues with idempotency here, also it could require
>>>>> extensive locking which drives up operation latencies (essentially
>>>>> "recursive delete"). It sounds possible, but someone would have to take
>>>>> a closer look as to the technical challenges involved.
>>>>>
>>>>>
>>>>> Our general philosophy is to keep things as simple as possible wrt api,
>>>>> semantics, implementation, etc... Distributed communication is hard and
>>>>> while we handle a lot of the issues for you it's still complex.
>>>>> Following our philosophy generally makes the easy things simple and the
>>>>> hard things possible, additionally it reduces the number of bugs that we
>>>>> have in the implementation itself (both user and service code).
>>>>>
>>>>> I don't wish to discourage you as much as provide insight/background
>>>>> into some of our decisions.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Patrick
>>>>>
>>>>>
>>>>>
>>>>>  On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>>
>>>>>>
>>>>>>  Dominic Williams wrote:
>>>>>>>
>>>>>>>
>>>>>>>  1/ If a node crashes or something else goes wrong, you leave behind
>>>>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>>>>> old
>>>>>>>> tmp folders used to fill with files under Windows
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  That's true. One either needs to use ephemerals or use persistent
>>>>>>> and
>>>>>>> have
>>>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>>>> preferable to use the ephemeral.
>>>>>>>
>>>>>>>
>>>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>>>
>>>>>>>
>>>>>>>  actually
>>>>>>>> having to write to disk somewhere.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  This is not actually how ZK works. All znodes regardless of
>>>>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>>>>> are
>>>>>>> tied to the session that created them. As long as the session is alive
>>>>>>> the
>>>>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>>>>> stored by the ZK cluster. This allows the shutdown of the entire
>>>>>>> cluster
>>>>>>> and
>>>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>>>> move
>>>>>>> from server to server (if say network connectivity to server A fails,
>>>>>>> or
>>>>>>> server A itself fails then the client will move to server B). The
>>>>>>> session
>>>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>>>> withing
>>>>>>> the expiration timeout value).
>>>>>>>
>>>>>>>
>>>>>>>  To avoid this I'm actually thinking of writing locking system where
>>>>>>> you
>>>>>>>
>>>>>>>
>>>>>>>  work
>>>>>>>> out the existing chain not by enumerating sequential children, but by
>>>>>>>> looking at the contents of each temporary lock node to see what it is
>>>>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>>>>> is
>>>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  There are a few cases to think about.
>>>>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>>>>> doesn't make sense
>>>>>>>
>>>>>>> 2) ephemeral nodes have an owner - the session that created them. so
>>>>>>> it
>>>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>>>> /foo
>>>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>>>
>>>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral
>>>>>>> with
>>>>>>> the
>>>>>>> same owner".
>>>>>>>
>>>>>>> 4) there are also issues of order. in particular what is the "deletion
>>>>>>> order" depth first or breadth first, etc...
>>>>>>>
>>>>>>> I believe the answer so far has been "we don't do this because it's
>>>>>>> fairly
>>>>>>> complicated and we haven't seen any use cases that require it." In the
>>>>>>> cases
>>>>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>>>>> or a
>>>>>>> simpler way available.
>>>>>>>
>>>>>>> Does that make sense? Thoughts?
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
> 

Re: A deficiency? You can only add children to persistent nodes

Posted by Dominic Williams <th...@googlemail.com>.
Ok will do. Will create patch for weak watchers and corral existing
discussion into ephem in ephem.

This will happen some time this week

Thanks for all the feedback.

On 23 March 2010 16:29, Patrick Hunt <ph...@apache.org> wrote:

> Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem in
> ephem and a second for the weak watchers. Summarize your goals and the
> discussion so far as appropriate. BTW, either of these JIRAs would be good
> for a new contributor interested in gaining experience with ZooKeeper. ephem
> is a bit tougher, but not so extensive that it's insurmountable (of course
> our current dev base will help out).
>
> Regards,
>
> Patrick
>
>
> Jeff Hammerbacher wrote:
>
>> Hey Ben,
>>
>> Perhaps you should open a JIRA for further discussion?
>>
>> Thanks,
>> Jeff
>>
>> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com>
>> wrote:
>>
>>  let me put out an idea that we have kicked around for a while: ephemeral
>>> containers. the idea is that the znode disappears if it doesn't have
>>> children. you would create the znode with create("/path", data, acl,
>>> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path
>>> and
>>> /path/child. (we have to create it with a child otherwise it immediately
>>> disappears.)
>>>
>>> i think this mechanism would address your need in a way that is easy to
>>> implement and use. it would also allow you to do a cool barrier
>>> implementation!
>>>
>>> ben
>>>
>>>
>>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>>
>>>  Dominic Williams wrote:
>>>>
>>>>
>>>>  What I'd suggest might work:
>>>>> - when the session that created the parent ends, ownership of the
>>>>> parent
>>>>> could either be transferred to the owner/session that created the
>>>>> oldest
>>>>> child, or instead ownership could be transferred to some kind of
>>>>> nominal
>>>>> system session (which would delete the parent once the last ephemeral
>>>>> child
>>>>> disappeared)
>>>>>
>>>>>
>>>>>  There may be some issues with idempotency here, also it could require
>>>> extensive locking which drives up operation latencies (essentially
>>>> "recursive delete"). It sounds possible, but someone would have to take
>>>> a closer look as to the technical challenges involved.
>>>>
>>>>
>>>> Our general philosophy is to keep things as simple as possible wrt api,
>>>> semantics, implementation, etc... Distributed communication is hard and
>>>> while we handle a lot of the issues for you it's still complex.
>>>> Following our philosophy generally makes the easy things simple and the
>>>> hard things possible, additionally it reduces the number of bugs that we
>>>> have in the implementation itself (both user and service code).
>>>>
>>>> I don't wish to discourage you as much as provide insight/background
>>>> into some of our decisions.
>>>>
>>>> Regards,
>>>>
>>>> Patrick
>>>>
>>>>
>>>>
>>>>  On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>>
>>>>>
>>>>>
>>>>>  Dominic Williams wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>  1/ If a node crashes or something else goes wrong, you leave behind
>>>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>>>> old
>>>>>>> tmp folders used to fill with files under Windows
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  That's true. One either needs to use ephemerals or use persistent
>>>>>> and
>>>>>> have
>>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>>> preferable to use the ephemeral.
>>>>>>
>>>>>>
>>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>>
>>>>>>
>>>>>>  actually
>>>>>>> having to write to disk somewhere.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  This is not actually how ZK works. All znodes regardless of
>>>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>>>> are
>>>>>> tied to the session that created them. As long as the session is alive
>>>>>> the
>>>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>>>> stored by the ZK cluster. This allows the shutdown of the entire
>>>>>> cluster
>>>>>> and
>>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>>> move
>>>>>> from server to server (if say network connectivity to server A fails,
>>>>>> or
>>>>>> server A itself fails then the client will move to server B). The
>>>>>> session
>>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>>> withing
>>>>>> the expiration timeout value).
>>>>>>
>>>>>>
>>>>>>  To avoid this I'm actually thinking of writing locking system where
>>>>>> you
>>>>>>
>>>>>>
>>>>>>  work
>>>>>>> out the existing chain not by enumerating sequential children, but by
>>>>>>> looking at the contents of each temporary lock node to see what it is
>>>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>>>> is
>>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  There are a few cases to think about.
>>>>>>
>>>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>>>> doesn't make sense
>>>>>>
>>>>>> 2) ephemeral nodes have an owner - the session that created them. so
>>>>>> it
>>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>>> /foo
>>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>>
>>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral
>>>>>> with
>>>>>> the
>>>>>> same owner".
>>>>>>
>>>>>> 4) there are also issues of order. in particular what is the "deletion
>>>>>> order" depth first or breadth first, etc...
>>>>>>
>>>>>> I believe the answer so far has been "we don't do this because it's
>>>>>> fairly
>>>>>> complicated and we haven't seen any use cases that require it." In the
>>>>>> cases
>>>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>>>> or a
>>>>>> simpler way available.
>>>>>>
>>>>>> Does that make sense? Thoughts?
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>

Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
Excellent idea Jeff. Dominic feel free to open 2 JIRAs, one for ephem in 
ephem and a second for the weak watchers. Summarize your goals and the 
discussion so far as appropriate. BTW, either of these JIRAs would be 
good for a new contributor interested in gaining experience with 
ZooKeeper. ephem is a bit tougher, but not so extensive that it's 
insurmountable (of course our current dev base will help out).

Regards,

Patrick

Jeff Hammerbacher wrote:
> Hey Ben,
> 
> Perhaps you should open a JIRA for further discussion?
> 
> Thanks,
> Jeff
> 
> On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
> 
>> let me put out an idea that we have kicked around for a while: ephemeral
>> containers. the idea is that the znode disappears if it doesn't have
>> children. you would create the znode with create("/path", data, acl,
>> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
>> /path/child. (we have to create it with a child otherwise it immediately
>> disappears.)
>>
>> i think this mechanism would address your need in a way that is easy to
>> implement and use. it would also allow you to do a cool barrier
>> implementation!
>>
>> ben
>>
>>
>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>
>>> Dominic Williams wrote:
>>>
>>>
>>>> What I'd suggest might work:
>>>> - when the session that created the parent ends, ownership of the parent
>>>> could either be transferred to the owner/session that created the oldest
>>>> child, or instead ownership could be transferred to some kind of nominal
>>>> system session (which would delete the parent once the last ephemeral
>>>> child
>>>> disappeared)
>>>>
>>>>
>>> There may be some issues with idempotency here, also it could require
>>> extensive locking which drives up operation latencies (essentially
>>> "recursive delete"). It sounds possible, but someone would have to take
>>> a closer look as to the technical challenges involved.
>>>
>>>
>>> Our general philosophy is to keep things as simple as possible wrt api,
>>> semantics, implementation, etc... Distributed communication is hard and
>>> while we handle a lot of the issues for you it's still complex.
>>> Following our philosophy generally makes the easy things simple and the
>>> hard things possible, additionally it reduces the number of bugs that we
>>> have in the implementation itself (both user and service code).
>>>
>>> I don't wish to discourage you as much as provide insight/background
>>> into some of our decisions.
>>>
>>> Regards,
>>>
>>> Patrick
>>>
>>>
>>>
>>>> On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>>
>>>>
>>>>
>>>>> Dominic Williams wrote:
>>>>>
>>>>>
>>>>>
>>>>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>>> old
>>>>>> tmp folders used to fill with files under Windows
>>>>>>
>>>>>>
>>>>>>
>>>>> That's true. One either needs to use ephemerals or use persistent and
>>>>> have
>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>> preferable to use the ephemeral.
>>>>>
>>>>>
>>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>
>>>>>
>>>>>> actually
>>>>>> having to write to disk somewhere.
>>>>>>
>>>>>>
>>>>>>
>>>>> This is not actually how ZK works. All znodes regardless of
>>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>>> are
>>>>> tied to the session that created them. As long as the session is alive
>>>>> the
>>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>>> stored by the ZK cluster. This allows the shutdown of the entire cluster
>>>>> and
>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>> move
>>>>> from server to server (if say network connectivity to server A fails, or
>>>>> server A itself fails then the client will move to server B). The
>>>>> session
>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>> withing
>>>>> the expiration timeout value).
>>>>>
>>>>>
>>>>>  To avoid this I'm actually thinking of writing locking system where you
>>>>>
>>>>>
>>>>>> work
>>>>>> out the existing chain not by enumerating sequential children, but by
>>>>>> looking at the contents of each temporary lock node to see what it is
>>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>>> is
>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>
>>>>>>
>>>>>>
>>>>> There are a few cases to think about.
>>>>>
>>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>>> doesn't make sense
>>>>>
>>>>> 2) ephemeral nodes have an owner - the session that created them. so it
>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>> /foo
>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>
>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral with
>>>>> the
>>>>> same owner".
>>>>>
>>>>> 4) there are also issues of order. in particular what is the "deletion
>>>>> order" depth first or breadth first, etc...
>>>>>
>>>>> I believe the answer so far has been "we don't do this because it's
>>>>> fairly
>>>>> complicated and we haven't seen any use cases that require it." In the
>>>>> cases
>>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>>> or a
>>>>> simpler way available.
>>>>>
>>>>> Does that make sense? Thoughts?
>>>>>
>>>>> Patrick
>>>>>
>>>>>
>>>>>
>>>>
> 

Re: A deficiency? You can only add children to persistent nodes

Posted by Jeff Hammerbacher <ha...@cloudera.com>.
Hey Ben,

Perhaps you should open a JIRA for further discussion?

Thanks,
Jeff

On Mon, Mar 22, 2010 at 7:42 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> let me put out an idea that we have kicked around for a while: ephemeral
> containers. the idea is that the znode disappears if it doesn't have
> children. you would create the znode with create("/path", data, acl,
> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
> /path/child. (we have to create it with a child otherwise it immediately
> disappears.)
>
> i think this mechanism would address your need in a way that is easy to
> implement and use. it would also allow you to do a cool barrier
> implementation!
>
> ben
>
>
> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>
>> Dominic Williams wrote:
>>
>>
>>> What I'd suggest might work:
>>> - when the session that created the parent ends, ownership of the parent
>>> could either be transferred to the owner/session that created the oldest
>>> child, or instead ownership could be transferred to some kind of nominal
>>> system session (which would delete the parent once the last ephemeral
>>> child
>>> disappeared)
>>>
>>>
>> There may be some issues with idempotency here, also it could require
>> extensive locking which drives up operation latencies (essentially
>> "recursive delete"). It sounds possible, but someone would have to take
>> a closer look as to the technical challenges involved.
>>
>>
>> Our general philosophy is to keep things as simple as possible wrt api,
>> semantics, implementation, etc... Distributed communication is hard and
>> while we handle a lot of the issues for you it's still complex.
>> Following our philosophy generally makes the easy things simple and the
>> hard things possible, additionally it reduces the number of bugs that we
>> have in the implementation itself (both user and service code).
>>
>> I don't wish to discourage you as much as provide insight/background
>> into some of our decisions.
>>
>> Regards,
>>
>> Patrick
>>
>>
>>
>>> On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>
>>>
>>>
>>>> Dominic Williams wrote:
>>>>
>>>>
>>>>
>>>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>> old
>>>>> tmp folders used to fill with files under Windows
>>>>>
>>>>>
>>>>>
>>>> That's true. One either needs to use ephemerals or use persistent and
>>>> have
>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>> preferable to use the ephemeral.
>>>>
>>>>
>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>
>>>>
>>>>> actually
>>>>> having to write to disk somewhere.
>>>>>
>>>>>
>>>>>
>>>> This is not actually how ZK works. All znodes regardless of
>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>> are
>>>> tied to the session that created them. As long as the session is alive
>>>> the
>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>> stored by the ZK cluster. This allows the shutdown of the entire cluster
>>>> and
>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>> move
>>>> from server to server (if say network connectivity to server A fails, or
>>>> server A itself fails then the client will move to server B). The
>>>> session
>>>> and all ephemerals are maintained (well, as long as the client moves
>>>> withing
>>>> the expiration timeout value).
>>>>
>>>>
>>>>  To avoid this I'm actually thinking of writing locking system where you
>>>>
>>>>
>>>>> work
>>>>> out the existing chain not by enumerating sequential children, but by
>>>>> looking at the contents of each temporary lock node to see what it is
>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>> is
>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>
>>>>>
>>>>>
>>>> There are a few cases to think about.
>>>>
>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>> doesn't make sense
>>>>
>>>> 2) ephemeral nodes have an owner - the session that created them. so it
>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>> /foo
>>>> with another ephemeral /foo/bar with a different owner.
>>>>
>>>> 3) so you are left with "ephemerals can be a child of an ephemeral with
>>>> the
>>>> same owner".
>>>>
>>>> 4) there are also issues of order. in particular what is the "deletion
>>>> order" depth first or breadth first, etc...
>>>>
>>>> I believe the answer so far has been "we don't do this because it's
>>>> fairly
>>>> complicated and we haven't seen any use cases that require it." In the
>>>> cases
>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>> or a
>>>> simpler way available.
>>>>
>>>> Does that make sense? Thoughts?
>>>>
>>>> Patrick
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: A deficiency? You can only add children to persistent nodes

Posted by Benjamin Reed <br...@yahoo-inc.com>.
in some sense the children will "own" the parent. the nice thing about 
it is that it isn't tied to any particular session, so we don't have to 
worry about weird cases like owners going away or switching ownership.

ben

On 03/23/2010 03:04 AM, Dominic Williams wrote:
> Hi, would work nicely.
>
> Who would own the parent node after the session that created the initial
> pair exits (assume additional children exist)?
>
> On 23 March 2010 02:42, Benjamin Reed<br...@yahoo-inc.com>  wrote:
>
>    
>> let me put out an idea that we have kicked around for a while: ephemeral
>> containers. the idea is that the znode disappears if it doesn't have
>> children. you would create the znode with create("/path", data, acl,
>> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
>> /path/child. (we have to create it with a child otherwise it immediately
>> disappears.)
>>
>> i think this mechanism would address your need in a way that is easy to
>> implement and use. it would also allow you to do a cool barrier
>> implementation!
>>
>> ben
>>
>>
>> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>>
>>      
>>> Dominic Williams wrote:
>>>
>>>
>>>        
>>>> What I'd suggest might work:
>>>> - when the session that created the parent ends, ownership of the parent
>>>> could either be transferred to the owner/session that created the oldest
>>>> child, or instead ownership could be transferred to some kind of nominal
>>>> system session (which would delete the parent once the last ephemeral
>>>> child
>>>> disappeared)
>>>>
>>>>
>>>>          
>>> There may be some issues with idempotency here, also it could require
>>> extensive locking which drives up operation latencies (essentially
>>> "recursive delete"). It sounds possible, but someone would have to take
>>> a closer look as to the technical challenges involved.
>>>
>>>
>>> Our general philosophy is to keep things as simple as possible wrt api,
>>> semantics, implementation, etc... Distributed communication is hard and
>>> while we handle a lot of the issues for you it's still complex.
>>> Following our philosophy generally makes the easy things simple and the
>>> hard things possible, additionally it reduces the number of bugs that we
>>> have in the implementation itself (both user and service code).
>>>
>>> I don't wish to discourage you as much as provide insight/background
>>> into some of our decisions.
>>>
>>> Regards,
>>>
>>> Patrick
>>>
>>>
>>>
>>>        
>>>> On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>   wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> Dominic Williams wrote:
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>>> old
>>>>>> tmp folders used to fill with files under Windows
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> That's true. One either needs to use ephemerals or use persistent and
>>>>> have
>>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>>> preferable to use the ephemeral.
>>>>>
>>>>>
>>>>>   2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>>
>>>>>
>>>>>            
>>>>>> actually
>>>>>> having to write to disk somewhere.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> This is not actually how ZK works. All znodes regardless of
>>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>>> are
>>>>> tied to the session that created them. As long as the session is alive
>>>>> the
>>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>>> stored by the ZK cluster. This allows the shutdown of the entire cluster
>>>>> and
>>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>>> move
>>>>> from server to server (if say network connectivity to server A fails, or
>>>>> server A itself fails then the client will move to server B). The
>>>>> session
>>>>> and all ephemerals are maintained (well, as long as the client moves
>>>>> withing
>>>>> the expiration timeout value).
>>>>>
>>>>>
>>>>>   To avoid this I'm actually thinking of writing locking system where you
>>>>>
>>>>>
>>>>>            
>>>>>> work
>>>>>> out the existing chain not by enumerating sequential children, but by
>>>>>> looking at the contents of each temporary lock node to see what it is
>>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>>> is
>>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> There are a few cases to think about.
>>>>>
>>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>>> doesn't make sense
>>>>>
>>>>> 2) ephemeral nodes have an owner - the session that created them. so it
>>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>>> /foo
>>>>> with another ephemeral /foo/bar with a different owner.
>>>>>
>>>>> 3) so you are left with "ephemerals can be a child of an ephemeral with
>>>>> the
>>>>> same owner".
>>>>>
>>>>> 4) there are also issues of order. in particular what is the "deletion
>>>>> order" depth first or breadth first, etc...
>>>>>
>>>>> I believe the answer so far has been "we don't do this because it's
>>>>> fairly
>>>>> complicated and we haven't seen any use cases that require it." In the
>>>>> cases
>>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>>> or a
>>>>> simpler way available.
>>>>>
>>>>> Does that make sense? Thoughts?
>>>>>
>>>>> Patrick
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>>        
>>      


Re: A deficiency? You can only add children to persistent nodes

Posted by Dominic Williams <th...@googlemail.com>.
Hi, would work nicely.

Who would own the parent node after the session that created the initial
pair exits (assume additional children exist)?

On 23 March 2010 02:42, Benjamin Reed <br...@yahoo-inc.com> wrote:

> let me put out an idea that we have kicked around for a while: ephemeral
> containers. the idea is that the znode disappears if it doesn't have
> children. you would create the znode with create("/path", data, acl,
> EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path and
> /path/child. (we have to create it with a child otherwise it immediately
> disappears.)
>
> i think this mechanism would address your need in a way that is easy to
> implement and use. it would also allow you to do a cool barrier
> implementation!
>
> ben
>
>
> On 03/22/2010 10:37 AM, Patrick Hunt wrote:
>
>> Dominic Williams wrote:
>>
>>
>>> What I'd suggest might work:
>>> - when the session that created the parent ends, ownership of the parent
>>> could either be transferred to the owner/session that created the oldest
>>> child, or instead ownership could be transferred to some kind of nominal
>>> system session (which would delete the parent once the last ephemeral
>>> child
>>> disappeared)
>>>
>>>
>> There may be some issues with idempotency here, also it could require
>> extensive locking which drives up operation latencies (essentially
>> "recursive delete"). It sounds possible, but someone would have to take
>> a closer look as to the technical challenges involved.
>>
>>
>> Our general philosophy is to keep things as simple as possible wrt api,
>> semantics, implementation, etc... Distributed communication is hard and
>> while we handle a lot of the issues for you it's still complex.
>> Following our philosophy generally makes the easy things simple and the
>> hard things possible, additionally it reduces the number of bugs that we
>> have in the implementation itself (both user and service code).
>>
>> I don't wish to discourage you as much as provide insight/background
>> into some of our decisions.
>>
>> Regards,
>>
>> Patrick
>>
>>
>>
>>> On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>>
>>>
>>>
>>>> Dominic Williams wrote:
>>>>
>>>>
>>>>
>>>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>>>> persistent nodes. Over time these will grow and grow, rather like the
>>>>> old
>>>>> tmp folders used to fill with files under Windows
>>>>>
>>>>>
>>>>>
>>>> That's true. One either needs to use ephemerals or use persistent and
>>>> have
>>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>>> preferable to use the ephemeral.
>>>>
>>>>
>>>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>>
>>>>
>>>>> actually
>>>>> having to write to disk somewhere.
>>>>>
>>>>>
>>>>>
>>>> This is not actually how ZK works. All znodes regardless of
>>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes
>>>> are
>>>> tied to the session that created them. As long as the session is alive
>>>> the
>>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>>> stored by the ZK cluster. This allows the shutdown of the entire cluster
>>>> and
>>>> restart it, all sessions/ephemerals will be maintained. Sessions can
>>>> move
>>>> from server to server (if say network connectivity to server A fails, or
>>>> server A itself fails then the client will move to server B). The
>>>> session
>>>> and all ephemerals are maintained (well, as long as the client moves
>>>> withing
>>>> the expiration timeout value).
>>>>
>>>>
>>>>  To avoid this I'm actually thinking of writing locking system where you
>>>>
>>>>
>>>>> work
>>>>> out the existing chain not by enumerating sequential children, but by
>>>>> looking at the contents of each temporary lock node to see what it is
>>>>> waiting on. But... that's quite horrible. Was wondering whether there
>>>>> is
>>>>> some technical reason why you ephemeral nodes can't have children??
>>>>>
>>>>>
>>>>>
>>>> There are a few cases to think about.
>>>>
>>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>>> doesn't make sense
>>>>
>>>> 2) ephemeral nodes have an owner - the session that created them. so it
>>>> would also not make sense (in my mind at least) to have an ephemeral
>>>> /foo
>>>> with another ephemeral /foo/bar with a different owner.
>>>>
>>>> 3) so you are left with "ephemerals can be a child of an ephemeral with
>>>> the
>>>> same owner".
>>>>
>>>> 4) there are also issues of order. in particular what is the "deletion
>>>> order" depth first or breadth first, etc...
>>>>
>>>> I believe the answer so far has been "we don't do this because it's
>>>> fairly
>>>> complicated and we haven't seen any use cases that require it." In the
>>>> cases
>>>> I've seen so far there was either a misunderstanding of how zk worked,
>>>> or a
>>>> simpler way available.
>>>>
>>>> Does that make sense? Thoughts?
>>>>
>>>> Patrick
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: A deficiency? You can only add children to persistent nodes

Posted by Benjamin Reed <br...@yahoo-inc.com>.
let me put out an idea that we have kicked around for a while: ephemeral 
containers. the idea is that the znode disappears if it doesn't have 
children. you would create the znode with create("/path", data, acl, 
EPHEMERAL_CONTAINER) this would result in the creation two znodes: /path 
and /path/child. (we have to create it with a child otherwise it 
immediately disappears.)

i think this mechanism would address your need in a way that is easy to 
implement and use. it would also allow you to do a cool barrier 
implementation!

ben

On 03/22/2010 10:37 AM, Patrick Hunt wrote:
> Dominic Williams wrote:
>    
>> What I'd suggest might work:
>> - when the session that created the parent ends, ownership of the parent
>> could either be transferred to the owner/session that created the oldest
>> child, or instead ownership could be transferred to some kind of nominal
>> system session (which would delete the parent once the last ephemeral child
>> disappeared)
>>      
> There may be some issues with idempotency here, also it could require
> extensive locking which drives up operation latencies (essentially
> "recursive delete"). It sounds possible, but someone would have to take
> a closer look as to the technical challenges involved.
>
>
> Our general philosophy is to keep things as simple as possible wrt api,
> semantics, implementation, etc... Distributed communication is hard and
> while we handle a lot of the issues for you it's still complex.
> Following our philosophy generally makes the easy things simple and the
> hard things possible, additionally it reduces the number of bugs that we
> have in the implementation itself (both user and service code).
>
> I don't wish to discourage you as much as provide insight/background
> into some of our decisions.
>
> Regards,
>
> Patrick
>
>    
>> On 22 March 2010 16:44, Patrick Hunt<ph...@apache.org>  wrote:
>>
>>      
>>> Dominic Williams wrote:
>>>
>>>        
>>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>>> persistent nodes. Over time these will grow and grow, rather like the old
>>>> tmp folders used to fill with files under Windows
>>>>
>>>>          
>>> That's true. One either needs to use ephemerals or use persistent and have
>>> a "garbage collector" (implicit or explicit gc). In most cases it's
>>> preferable to use the ephemeral.
>>>
>>>
>>>   2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>>        
>>>> actually
>>>> having to write to disk somewhere.
>>>>
>>>>          
>>> This is not actually how ZK works. All znodes regardless of
>>> persistent/ephemeral are written to disk persistently. Ephemeral nodes are
>>> tied to the session that created them. As long as the session is alive the
>>> ephemeral node is alive. Sessions themselves are persistently/reliably
>>> stored by the ZK cluster. This allows the shutdown of the entire cluster and
>>> restart it, all sessions/ephemerals will be maintained. Sessions can move
>>> from server to server (if say network connectivity to server A fails, or
>>> server A itself fails then the client will move to server B). The session
>>> and all ephemerals are maintained (well, as long as the client moves withing
>>> the expiration timeout value).
>>>
>>>
>>>   To avoid this I'm actually thinking of writing locking system where you
>>>        
>>>> work
>>>> out the existing chain not by enumerating sequential children, but by
>>>> looking at the contents of each temporary lock node to see what it is
>>>> waiting on. But... that's quite horrible. Was wondering whether there is
>>>> some technical reason why you ephemeral nodes can't have children??
>>>>
>>>>          
>>> There are a few cases to think about.
>>>
>>> 1) obviously ephemeral nodes can't have persistent children, this just
>>> doesn't make sense
>>>
>>> 2) ephemeral nodes have an owner - the session that created them. so it
>>> would also not make sense (in my mind at least) to have an ephemeral /foo
>>> with another ephemeral /foo/bar with a different owner.
>>>
>>> 3) so you are left with "ephemerals can be a child of an ephemeral with the
>>> same owner".
>>>
>>> 4) there are also issues of order. in particular what is the "deletion
>>> order" depth first or breadth first, etc...
>>>
>>> I believe the answer so far has been "we don't do this because it's fairly
>>> complicated and we haven't seen any use cases that require it." In the cases
>>> I've seen so far there was either a misunderstanding of how zk worked, or a
>>> simpler way available.
>>>
>>> Does that make sense? Thoughts?
>>>
>>> Patrick
>>>
>>>        
>>      


Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
Dominic Williams wrote:
> What I'd suggest might work:
> - when the session that created the parent ends, ownership of the parent
> could either be transferred to the owner/session that created the oldest
> child, or instead ownership could be transferred to some kind of nominal
> system session (which would delete the parent once the last ephemeral child
> disappeared)

There may be some issues with idempotency here, also it could require 
extensive locking which drives up operation latencies (essentially 
"recursive delete"). It sounds possible, but someone would have to take 
a closer look as to the technical challenges involved.


Our general philosophy is to keep things as simple as possible wrt api, 
semantics, implementation, etc... Distributed communication is hard and 
while we handle a lot of the issues for you it's still complex. 
Following our philosophy generally makes the easy things simple and the 
hard things possible, additionally it reduces the number of bugs that we 
have in the implementation itself (both user and service code).

I don't wish to discourage you as much as provide insight/background 
into some of our decisions.

Regards,

Patrick

> 
> On 22 March 2010 16:44, Patrick Hunt <ph...@apache.org> wrote:
> 
>> Dominic Williams wrote:
>>
>>> 1/ If a node crashes or something else goes wrong, you leave behind
>>> persistent nodes. Over time these will grow and grow, rather like the old
>>> tmp folders used to fill with files under Windows
>>>
>> That's true. One either needs to use ephemerals or use persistent and have
>> a "garbage collector" (implicit or explicit gc). In most cases it's
>> preferable to use the ephemeral.
>>
>>
>>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>>> actually
>>> having to write to disk somewhere.
>>>
>> This is not actually how ZK works. All znodes regardless of
>> persistent/ephemeral are written to disk persistently. Ephemeral nodes are
>> tied to the session that created them. As long as the session is alive the
>> ephemeral node is alive. Sessions themselves are persistently/reliably
>> stored by the ZK cluster. This allows the shutdown of the entire cluster and
>> restart it, all sessions/ephemerals will be maintained. Sessions can move
>> from server to server (if say network connectivity to server A fails, or
>> server A itself fails then the client will move to server B). The session
>> and all ephemerals are maintained (well, as long as the client moves withing
>> the expiration timeout value).
>>
>>
>>  To avoid this I'm actually thinking of writing locking system where you
>>> work
>>> out the existing chain not by enumerating sequential children, but by
>>> looking at the contents of each temporary lock node to see what it is
>>> waiting on. But... that's quite horrible. Was wondering whether there is
>>> some technical reason why you ephemeral nodes can't have children??
>>>
>> There are a few cases to think about.
>>
>> 1) obviously ephemeral nodes can't have persistent children, this just
>> doesn't make sense
>>
>> 2) ephemeral nodes have an owner - the session that created them. so it
>> would also not make sense (in my mind at least) to have an ephemeral /foo
>> with another ephemeral /foo/bar with a different owner.
>>
>> 3) so you are left with "ephemerals can be a child of an ephemeral with the
>> same owner".
>>
>> 4) there are also issues of order. in particular what is the "deletion
>> order" depth first or breadth first, etc...
>>
>> I believe the answer so far has been "we don't do this because it's fairly
>> complicated and we haven't seen any use cases that require it." In the cases
>> I've seen so far there was either a misunderstanding of how zk worked, or a
>> simpler way available.
>>
>> Does that make sense? Thoughts?
>>
>> Patrick
>>
> 

Re: A deficiency? You can only add children to persistent nodes

Posted by Dominic Williams <th...@googlemail.com>.
Hi Patrick,

This is a little more complex than I'd initially given consideration to. The
biggest point being the question about what would happen to an ephemeral
parent node when the session that owns it exits.

What I'd suggest might work:
- when the session that created the parent ends, ownership of the parent
could either be transferred to the owner/session that created the oldest
child, or instead ownership could be transferred to some kind of nominal
system session (which would delete the parent once the last ephemeral child
disappeared)
- when someone tries to create a persistent child of an ephemeral node, they
simply get back an appropriate error code

Best, Dominic

On 22 March 2010 16:44, Patrick Hunt <ph...@apache.org> wrote:

> Dominic Williams wrote:
>
>> 1/ If a node crashes or something else goes wrong, you leave behind
>> persistent nodes. Over time these will grow and grow, rather like the old
>> tmp folders used to fill with files under Windows
>>
>
> That's true. One either needs to use ephemerals or use persistent and have
> a "garbage collector" (implicit or explicit gc). In most cases it's
> preferable to use the ephemeral.
>
>
>  2/ Persistent nodes = nasty scalability *bottleneck* because you're
>> actually
>> having to write to disk somewhere.
>>
>
> This is not actually how ZK works. All znodes regardless of
> persistent/ephemeral are written to disk persistently. Ephemeral nodes are
> tied to the session that created them. As long as the session is alive the
> ephemeral node is alive. Sessions themselves are persistently/reliably
> stored by the ZK cluster. This allows the shutdown of the entire cluster and
> restart it, all sessions/ephemerals will be maintained. Sessions can move
> from server to server (if say network connectivity to server A fails, or
> server A itself fails then the client will move to server B). The session
> and all ephemerals are maintained (well, as long as the client moves withing
> the expiration timeout value).
>
>
>  To avoid this I'm actually thinking of writing locking system where you
>> work
>> out the existing chain not by enumerating sequential children, but by
>> looking at the contents of each temporary lock node to see what it is
>> waiting on. But... that's quite horrible. Was wondering whether there is
>> some technical reason why you ephemeral nodes can't have children??
>>
>
> There are a few cases to think about.
>
> 1) obviously ephemeral nodes can't have persistent children, this just
> doesn't make sense
>
> 2) ephemeral nodes have an owner - the session that created them. so it
> would also not make sense (in my mind at least) to have an ephemeral /foo
> with another ephemeral /foo/bar with a different owner.
>
> 3) so you are left with "ephemerals can be a child of an ephemeral with the
> same owner".
>
> 4) there are also issues of order. in particular what is the "deletion
> order" depth first or breadth first, etc...
>
> I believe the answer so far has been "we don't do this because it's fairly
> complicated and we haven't seen any use cases that require it." In the cases
> I've seen so far there was either a misunderstanding of how zk worked, or a
> simpler way available.
>
> Does that make sense? Thoughts?
>
> Patrick
>

Re: A deficiency? You can only add children to persistent nodes

Posted by Patrick Hunt <ph...@apache.org>.
Dominic Williams wrote:
> 1/ If a node crashes or something else goes wrong, you leave behind
> persistent nodes. Over time these will grow and grow, rather like the old
> tmp folders used to fill with files under Windows

That's true. One either needs to use ephemerals or use persistent and 
have a "garbage collector" (implicit or explicit gc). In most cases it's 
preferable to use the ephemeral.

> 2/ Persistent nodes = nasty scalability *bottleneck* because you're actually
> having to write to disk somewhere.

This is not actually how ZK works. All znodes regardless of 
persistent/ephemeral are written to disk persistently. Ephemeral nodes 
are tied to the session that created them. As long as the session is 
alive the ephemeral node is alive. Sessions themselves are 
persistently/reliably stored by the ZK cluster. This allows the shutdown 
of the entire cluster and restart it, all sessions/ephemerals will be 
maintained. Sessions can move from server to server (if say network 
connectivity to server A fails, or server A itself fails then the client 
will move to server B). The session and all ephemerals are maintained 
(well, as long as the client moves withing the expiration timeout value).

> To avoid this I'm actually thinking of writing locking system where you work
> out the existing chain not by enumerating sequential children, but by
> looking at the contents of each temporary lock node to see what it is
> waiting on. But... that's quite horrible. Was wondering whether there is
> some technical reason why you ephemeral nodes can't have children??

There are a few cases to think about.

1) obviously ephemeral nodes can't have persistent children, this just 
doesn't make sense

2) ephemeral nodes have an owner - the session that created them. so it 
would also not make sense (in my mind at least) to have an ephemeral 
/foo with another ephemeral /foo/bar with a different owner.

3) so you are left with "ephemerals can be a child of an ephemeral with 
the same owner".

4) there are also issues of order. in particular what is the "deletion 
order" depth first or breadth first, etc...

I believe the answer so far has been "we don't do this because it's 
fairly complicated and we haven't seen any use cases that require it." 
In the cases I've seen so far there was either a misunderstanding of how 
zk worked, or a simpler way available.

Does that make sense? Thoughts?

Patrick