You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Dmitry Kuvayskiy <dm...@gmail.com> on 2012/12/10 18:45:42 UTC

Znodes are really wait-free objects?

Hello all.

The paper "ZooKeeper: Wait-free coordination for Internet-scale
systems" claims that znodes are wait-free objects (citing: ``Our
system, Zookeeper, hence implements an API that manipulates simple
_wait-free_ data objects organized hierarchically as in file
systems''). But I wonder if they are really "wait-free" (in the sense
that "every operation has a bound on the number of steps the algorithm
will take before the operation completes")?
As far as I know, ZooKeeper uses locks (well, "synchronized" keyword)
internally. I also cannot imagine a system that big and complex
working only with wait-free data structures. And if we use locks
internally, how can we state that the external interface is wait-free?
Also, I didn't find anything about "wait-freedom" on the
http://zookeeper.apache.org/ site. So can anyone explain a bit, what
this "wait-free" really means and how it is implemented in the code?

-- 
Yours sincerely,
Dmitry Kuvayskiy

Re: Znodes are really wait-free objects?

Posted by Dmitry Kuvayskiy <dm...@gmail.com>.
Thank you. Now it's clear.

On Mon, Dec 10, 2012 at 6:58 PM, Ted Dunning <te...@gmail.com> wrote:
> Wait-free means that from the user's point of view, all operations will
> proceed or fail immediately.  Synchronization internal to the server to
> ensure coherent memory access doesn't really apply here.
>
> The point is that ZK is not a lock manager and there is no corollary to a
> wait operation.  This is extremely important because locks encourage
> timeout designs which are notoriously unreliable (at least in expectation
> when implemented by normal human engineers).
>
> The high level operations that have to happen are, roughly,
>
> a) the node receiving your request gets the request and validates that it
> appears well formed.
>
> b) if the request is a read operation, it is satisfied by direct access to
> the data
>
> c) if the request is a write operation, it is forwarded to the master
>
> d) the master looks at memory and the queue to decide if this operation
> will succeed or fail
>
> e) if the operation will fail, the failure is returned
>
> f) if the operation will succeed, then it is converted to an idempotent
> operation and sent to the rest of the cluster for committing
>
> g) once a quorum commits the operation, success is returned
>
> h) if a quorum fails to commit the operation in time, failure is returned
>
> This is a moderately long list, but nothing here has any long lived locks
> or much chance for starvation.  That leads to the claim that all operations
> are lock-less from the user point of view.
>
>
> On Mon, Dec 10, 2012 at 9:45 AM, Dmitry Kuvayskiy <
> dmitry.kuvayskiy@gmail.com> wrote:
>
>> Hello all.
>>
>> The paper "ZooKeeper: Wait-free coordination for Internet-scale
>> systems" claims that znodes are wait-free objects (citing: ``Our
>> system, Zookeeper, hence implements an API that manipulates simple
>> _wait-free_ data objects organized hierarchically as in file
>> systems''). But I wonder if they are really "wait-free" (in the sense
>> that "every operation has a bound on the number of steps the algorithm
>> will take before the operation completes")?
>> As far as I know, ZooKeeper uses locks (well, "synchronized" keyword)
>> internally. I also cannot imagine a system that big and complex
>> working only with wait-free data structures. And if we use locks
>> internally, how can we state that the external interface is wait-free?
>> Also, I didn't find anything about "wait-freedom" on the
>> http://zookeeper.apache.org/ site. So can anyone explain a bit, what
>> this "wait-free" really means and how it is implemented in the code?
>>
>> --
>> Yours sincerely,
>> Dmitry Kuvayskiy
>>



-- 
Yours sincerely,
Dmitry Kuvayskiy

Re: Znodes are really wait-free objects?

Posted by Ted Dunning <te...@gmail.com>.
Wait-free means that from the user's point of view, all operations will
proceed or fail immediately.  Synchronization internal to the server to
ensure coherent memory access doesn't really apply here.

The point is that ZK is not a lock manager and there is no corollary to a
wait operation.  This is extremely important because locks encourage
timeout designs which are notoriously unreliable (at least in expectation
when implemented by normal human engineers).

The high level operations that have to happen are, roughly,

a) the node receiving your request gets the request and validates that it
appears well formed.

b) if the request is a read operation, it is satisfied by direct access to
the data

c) if the request is a write operation, it is forwarded to the master

d) the master looks at memory and the queue to decide if this operation
will succeed or fail

e) if the operation will fail, the failure is returned

f) if the operation will succeed, then it is converted to an idempotent
operation and sent to the rest of the cluster for committing

g) once a quorum commits the operation, success is returned

h) if a quorum fails to commit the operation in time, failure is returned

This is a moderately long list, but nothing here has any long lived locks
or much chance for starvation.  That leads to the claim that all operations
are lock-less from the user point of view.


On Mon, Dec 10, 2012 at 9:45 AM, Dmitry Kuvayskiy <
dmitry.kuvayskiy@gmail.com> wrote:

> Hello all.
>
> The paper "ZooKeeper: Wait-free coordination for Internet-scale
> systems" claims that znodes are wait-free objects (citing: ``Our
> system, Zookeeper, hence implements an API that manipulates simple
> _wait-free_ data objects organized hierarchically as in file
> systems''). But I wonder if they are really "wait-free" (in the sense
> that "every operation has a bound on the number of steps the algorithm
> will take before the operation completes")?
> As far as I know, ZooKeeper uses locks (well, "synchronized" keyword)
> internally. I also cannot imagine a system that big and complex
> working only with wait-free data structures. And if we use locks
> internally, how can we state that the external interface is wait-free?
> Also, I didn't find anything about "wait-freedom" on the
> http://zookeeper.apache.org/ site. So can anyone explain a bit, what
> this "wait-free" really means and how it is implemented in the code?
>
> --
> Yours sincerely,
> Dmitry Kuvayskiy
>