You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Giuseppe (Pino) de Candia" <gd...@midokura.com> on 2014/10/13 11:59:43 UTC

getChildren watcher and best-effort diffs?

Hi ZK Devs,

at Midokura we use ZK to store/manage/propagate some replicated sets and
are running into (expected) scalability and performance limits.

My understanding is that ZK's model of notifying "object X changed" (as
opposed to "change Y was applied to object X") and hence forcing
getChildren watchers to reload the entire child set is motivated by
simplicity and avoiding keeping state per client - ZK server would have to
buffer changes for temporarily disconnected clients.

I think the argument is sound for non-child data, but I think providing
child diffs on a best-effort basis should be both easy for the ZK server
and in line with many use-cases/recipes. We're about to investigate the
feasibility of such a design and tackle it ourselves, but I wanted to reach
out to the community to ask whether someone else has thought about this,
whether there's some fundamental reason not to implement it, and any advice
if we attempt it.

Specifically, a getChildren watcher would receive two kinds of
notifications:

   1. Simple "updated" without details, already provided today.
   2. The new notification that passes a 2-tuple (Set<String> added,
   Set<String> removed) - the Strings are individual child names under the
   watched path (e.g. "proc123" under "/zk/mylocks/")

Upon receiving and applying an update/transaction for a child set from the
leader, a ZK Server can easily compute the diff and send it to all
healthy/connected clients that are watching the parent - type 2
notification. Since the diffs are not buffered, any client that reconnects
(before its session expires) will simply be told that the child set changed
- type 1 notification. That's why these are "best-effort diffs for child
sets".

In the great majority of cases, most clients will be kept up to date by the
diffs, and only occasionally would clients need to re-read the entire child
list, this reducing the frequency of stampedes on the child directory and
making recipe-writing easier.

I look forward to hearing your thoughts and will try to also get back to
you with implementation-specific details.

best,
Pino

Re: getChildren watcher and best-effort diffs?

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
Hi Pino!

A watch on the children of a znode will be triggered when there is a change to the set, so it will always contain either a single znode added or a single znode removed. The functionality that you're describing sounds more like a getChildren that returns the changes since a given zxid, since in this case it can have multiple znodes added and removed between the watch triggering and the execution of the getDiffChildren.

Implementing such a call requires that the server keeps track of changes since a reference zxid and I suppose the reference zxid is determined when the watch is set. To reset the diff, one needs to run getDiffChildren or possibly set another watch (which means reset diff). 


There are also issues with clients crashing (do we remove the diff set?) and reconnecting (how does the server know the diff so far?).

-Flavio





On Monday, October 13, 2014 2:59 AM, Giuseppe (Pino) de Candia <gd...@midokura.com> wrote:
 

>
>
>Hi ZK Devs,
>
>at Midokura we use ZK to store/manage/propagate some replicated sets and
>are running into (expected) scalability and performance limits.
>
>My understanding is that ZK's model of notifying "object X changed" (as
>opposed to "change Y was applied to object X") and hence forcing
>getChildren watchers to reload the entire child set is motivated by
>simplicity and avoiding keeping state per client - ZK server would have to
>buffer changes for temporarily disconnected clients.
>
>I think the argument is sound for non-child data, but I think providing
>child diffs on a best-effort basis should be both easy for the ZK server
>and in line with many use-cases/recipes. We're about to investigate the
>feasibility of such a design and tackle it ourselves, but I wanted to reach
>out to the community to ask whether someone else has thought about this,
>whether there's some fundamental reason not to implement it, and any advice
>if we attempt it.
>
>Specifically, a getChildren watcher would receive two kinds of
>notifications:
>
>   1. Simple "updated" without details, already provided today.
>   2. The new notification that passes a 2-tuple (Set<String> added,
>   Set<String> removed) - the Strings are individual child names under the
>   watched path (e.g. "proc123" under "/zk/mylocks/")
>
>Upon receiving and applying an update/transaction for a child set from the
>leader, a ZK Server can easily compute the diff and send it to all
>healthy/connected clients that are watching the parent - type 2
>notification. Since the diffs are not buffered, any client that reconnects
>(before its session expires) will simply be told that the child set changed
>- type 1 notification. That's why these are "best-effort diffs for child
>sets".
>
>In the great majority of cases, most clients will be kept up to date by the
>diffs, and only occasionally would clients need to re-read the entire child
>list, this reducing the frequency of stampedes on the child directory and
>making recipe-writing easier.
>
>I look forward to hearing your thoughts and will try to also get back to
>you with implementation-specific details.
>
>best,
>Pino
>
>
>

Re: getChildren watcher and best-effort diffs?

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
It would be most efficient if this was done on the server. However, if you’re looking for an immediate solution and you’re using the JVM, Curator has a recipe that provides all these messages: 

https://curator.apache.org/curator-recipes/path-cache.html
https://curator.apache.org/apidocs/org/apache/curator/framework/recipes/cache/PathChildrenCache.html

-Jordan

On October 13, 2014 at 8:24:25 AM, Giuseppe (Pino) de Candia (gdecandia@midokura.com) wrote:

Hi ZK Devs,  

at Midokura we use ZK to store/manage/propagate some replicated sets and  
are running into (expected) scalability and performance limits.  

My understanding is that ZK's model of notifying "object X changed" (as  
opposed to "change Y was applied to object X") and hence forcing  
getChildren watchers to reload the entire child set is motivated by  
simplicity and avoiding keeping state per client - ZK server would have to  
buffer changes for temporarily disconnected clients.  

I think the argument is sound for non-child data, but I think providing  
child diffs on a best-effort basis should be both easy for the ZK server  
and in line with many use-cases/recipes. We're about to investigate the  
feasibility of such a design and tackle it ourselves, but I wanted to reach  
out to the community to ask whether someone else has thought about this,  
whether there's some fundamental reason not to implement it, and any advice  
if we attempt it.  

Specifically, a getChildren watcher would receive two kinds of  
notifications:  

1. Simple "updated" without details, already provided today.  
2. The new notification that passes a 2-tuple (Set<String> added,  
Set<String> removed) - the Strings are individual child names under the  
watched path (e.g. "proc123" under "/zk/mylocks/")  

Upon receiving and applying an update/transaction for a child set from the  
leader, a ZK Server can easily compute the diff and send it to all  
healthy/connected clients that are watching the parent - type 2  
notification. Since the diffs are not buffered, any client that reconnects  
(before its session expires) will simply be told that the child set changed  
- type 1 notification. That's why these are "best-effort diffs for child  
sets".  

In the great majority of cases, most clients will be kept up to date by the  
diffs, and only occasionally would clients need to re-read the entire child  
list, this reducing the frequency of stampedes on the child directory and  
making recipe-writing easier.  

I look forward to hearing your thoughts and will try to also get back to  
you with implementation-specific details.  

best,  
Pino