You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/12/26 19:37:58 UTC

[jira] [Commented] (ZOOKEEPER-1416) Persistent Recursive Watch

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15778863#comment-15778863 ] 

ASF GitHub Bot commented on ZOOKEEPER-1416:
-------------------------------------------

GitHub user Randgalt opened a pull request:

    https://github.com/apache/zookeeper/pull/136

    [ZOOKEEPER-1416] Persistent Recursive Watch

    Here is a completed implementation for a persistent, recursive watch addition for ZK. These watches are set via a new method, `addPersistentWatch()` and are removed via the existing watcher removal methods. Persistent, recursive watches have these characteristics:
    
    - Once set, they do not auto-remove when triggered
    - They trigger for all event types (child, data, etc.) on the node they are registered for and any child znode recursively.
    - They are efficiently implemented by using the existing watch internals. A new class `PathIterator` walks up the path parent-by-parent when checking if a watcher applies. 
    
    Persistent watcher specific tests are in `PersistentWatcherTest.java`. I'd appreciated feedback on other additional tests that should be added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Randgalt/zookeeper ZOOKEEPER-1416

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #136
    
----
commit 3c05c671d09e5b6df936af8f0a700995d5749e11
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-25T21:36:13Z

    basic work done. Needs more testing, tuning, etc.

commit ca4a000dcf294aaebd09d3118ebc62cb0783f9cc
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T15:06:55Z

    working on persistent watcher removal

commit bf13deda0b00ca67cd1fa963961d95a22634ed88
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T17:59:04Z

    Support resetting persistent watches

commit 27d8d6cd45cb6adfabf50143f6de62a371447519
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T18:21:17Z

    docs

commit 2766fb1020c600af579a0f701fa3c00ea92b7e22
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T18:44:42Z

    containsWatcher() was broken for STANDARD watchers

commit 86fa1fbcb75021179f80588a2ea46aad2127fb4e
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T19:20:00Z

    removed unused import

commit b490c84d1e56335ba66f9c56d64134886b144451
Author: randgalt <jo...@jordanzimmerman.com>
Date:   2016-12-26T19:20:08Z

    Updated doc for persistent watches

----


> Persistent Recursive Watch
> --------------------------
>
>                 Key: ZOOKEEPER-1416
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1416
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: c client, documentation, java client, server
>            Reporter: Phillip Liu
>            Assignee: Thawan Kooburat
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> h4. The Problem
> A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch event is sent to the client. If there are thousands of znodes being watched, when a client (re)connect, it would have to send thousands of watch requests. At Facebook, we have this problem storing information for thousands of db shards. Consequently a naming service that consumes the db shard definition issues thousands of watch requests each time the service starts and changes client watcher.
> h4. Proposed Solution
> We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant nodes. A Persistent Recursive Watch behaves as follows:
> # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
> # CHILDREN and DATA Recursive Watches can be placed on any znode.
> # EXISTS Recursive Watches can be placed on any path.
> # A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a  Recursive Watch means to set watches on all descendant znodes.
> # When a watch on a descendant fires, no subsequent event is fired until a corresponding getData(..) on the znode is called, then Recursive Watch automically apply the watch on the znode. This maintains the existing Watch semantic on an individual znode.
> # A Recursive Watch overrides any watches placed on a descendant znode. Practically this means the Recursive Watch Watcher callback is the one receiving the event and event is delivered exactly once.
> A goal here is to reduce the number of semantic changes. The guarantee of no intermediate watch event until data is read will be maintained. The only difference is we will automatically re-add the watch after read. At the same time we add the convience of reducing the need to add multiple watches for sibling znodes and in turn reduce the number of watch messages sent from the client to the server.
> There are some implementation details that needs to be hashed out. Initial thinking is to have the Recursive Watch create per-node watches. This will cause a lot of watches to be created on the server side. Currently, each watch is stored as a single bit in a bit set relative to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients, each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed across all Observers. Seems expensive.
> Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can be set each time a watch event from a Recursive Watch is fired. The memory utilization is relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the parameters given above.
> Otherwise, a relaxation of no intermediate watch event until read guarantee is required. If the server can send watch events regardless of one has already been fired without corresponding read, then the server can simply fire watch events without tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)