You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by dmly <dm...@gmail.com> on 2012/05/15 06:15:51 UTC

Does Zookeeper Lock Entire Znode Tree During a Write

Hi,
Does ZK lock the entire znode tree during a write? Or does ZK just locks the
top most znode that a client is connecting to?
For example:
When I connect to "/doug" and create the "doug/lock-001" node and do and
update, is "/" locked or just "/doug"?

Thanks

--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Does-Zookeeper-Lock-Entire-Znode-Tree-During-a-Write-tp7558524.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Does Zookeeper Lock Entire Znode Tree During a Write

Posted by Narayanan Arunachalam <na...@gmail.com>.
If you are not using sequence # for the nodes you create, you can use the version attribute to prevent the second client overwriting the first one. Basically the set fails, if the version is not same as when you queried the node.

Create will fail too if the node is already present.

With using sequence #, the client should check whether it is the lowest in the list of nodes created. If not wait till you become.

ARN 

On May 15, 2012, at 9:23 AM, Henry Robinson <he...@cloudera.com> wrote:

> On 14 May 2012 21:15, dmly <dm...@gmail.com> wrote:
> 
>> Hi,
>> Does ZK lock the entire znode tree during a write? Or does ZK just locks
>> the
>> top most znode that a client is connecting to?
>> For example:
>> When I connect to "/doug" and create the "doug/lock-001" node and do and
>> update, is "/" locked or just "/doug"?
>> 
>> 
> The short answer is no, ZK does not lock the whole tree. If you look at
> ZKDatabase.java and DataTree.java you can see in processTxn etc. that the
> only the node being written to (the parent node in the case of a create
> transaction) is 'locked' in the Java sense.
> 
> However the reason you're probably asking is that you're wondering about
> concurrency of operations, and whether a write potentially blocks another
> write from succeeding. This makes sense from a traditional database
> perspective where transactions are potentially long-running, and so
> fine-grained locking is needed to allow concurrent access to disjoint parts
> of a single table, for example.
> 
> In ZooKeeper, all write operations are serialised - not just in the
> logical, equivalent-to-a-sequential-history sense, but in the sense that
> they are all executed one after the other. So in that sense a write 'locks'
> the whole tree, because until it's completed in memory, any subsequent
> write to memory won't take place.
> 
> That's not to say that ZK doesn't provide some concurrency though.
> Transactions are multi-stage operations, and it's possible to pipeline
> these stages so that e.g. disk and CPU can be fully used at the same time.
> For example, another, earlier, stage of a write operation is logging the
> request to disk, for fault-tolerance purposes. It is possible for a later
> write to log itself while an earlier write is happening in memory. ZK takes
> advantage of this pipelining (see the RequestProcessor classes) so you can
> have several transactions 'in flight' at the same time, but they are all
> issued and processed in strict sequential order.
> 
> Henry
> 
> 
>> Thanks
>> 
>> --
>> View this message in context:
>> http://zookeeper-user.578899.n2.nabble.com/Does-Zookeeper-Lock-Entire-Znode-Tree-During-a-Write-tp7558524.html
>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>> 
> 
> 
> 
> -- 
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679

Re: Does Zookeeper Lock Entire Znode Tree During a Write

Posted by dmly <dm...@gmail.com>.
Yeah, that's what I thought too. When I said "locked" I meant ZK ensembly
will block access (read and write) to the clients that are connecting to the
parent node. Say you and I both connecting to "/example/doug". I issue a
write to create "/example/doug/lock". Then ZK will block you from accessing
the "/example/doug" and anything below that. But if Henry's client is
connecting to "/anotherexample" and Henry tries to read a node there he will
not be blocked.
I'll buy Henry's answer.
Thanks

--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Does-Zookeeper-Lock-Entire-Znode-Tree-During-a-Write-tp7558524p7560793.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Does Zookeeper Lock Entire Znode Tree During a Write

Posted by Henry Robinson <he...@cloudera.com>.
On 14 May 2012 21:15, dmly <dm...@gmail.com> wrote:

> Hi,
> Does ZK lock the entire znode tree during a write? Or does ZK just locks
> the
> top most znode that a client is connecting to?
> For example:
> When I connect to "/doug" and create the "doug/lock-001" node and do and
> update, is "/" locked or just "/doug"?
>
>
The short answer is no, ZK does not lock the whole tree. If you look at
ZKDatabase.java and DataTree.java you can see in processTxn etc. that the
only the node being written to (the parent node in the case of a create
transaction) is 'locked' in the Java sense.

However the reason you're probably asking is that you're wondering about
concurrency of operations, and whether a write potentially blocks another
write from succeeding. This makes sense from a traditional database
perspective where transactions are potentially long-running, and so
fine-grained locking is needed to allow concurrent access to disjoint parts
of a single table, for example.

In ZooKeeper, all write operations are serialised - not just in the
logical, equivalent-to-a-sequential-history sense, but in the sense that
they are all executed one after the other. So in that sense a write 'locks'
the whole tree, because until it's completed in memory, any subsequent
write to memory won't take place.

That's not to say that ZK doesn't provide some concurrency though.
Transactions are multi-stage operations, and it's possible to pipeline
these stages so that e.g. disk and CPU can be fully used at the same time.
For example, another, earlier, stage of a write operation is logging the
request to disk, for fault-tolerance purposes. It is possible for a later
write to log itself while an earlier write is happening in memory. ZK takes
advantage of this pipelining (see the RequestProcessor classes) so you can
have several transactions 'in flight' at the same time, but they are all
issued and processed in strict sequential order.

Henry


> Thanks
>
> --
> View this message in context:
> http://zookeeper-user.578899.n2.nabble.com/Does-Zookeeper-Lock-Entire-Znode-Tree-During-a-Write-tp7558524.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: Does Zookeeper Lock Entire Znode Tree During a Write

Posted by Ben Bangert <be...@groovie.org>.
On 5/14/12 9:15 PM, dmly wrote:
> Hi,
> Does ZK lock the entire znode tree during a write? Or does ZK just locks the
> top most znode that a client is connecting to?
> For example:
> When I connect to "/doug" and create the "doug/lock-001" node and do and
> update, is "/" locked or just "/doug"?

ZK actually doesn't have the notion of locks inherently, locking is a
client concept implemented by using ZK atomic operations. So none of the
nodes are actually locked in ZK when you create a node named 'lock-001'.
Any ZK client with appropriate permissions to the node (defined by the
node's ACL) may alter/remove the nodes.

-- 
Ben Bangert
(ben@ || http://) groovie.org