You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by svk <sh...@gmail.com> on 2016/08/23 00:54:29 UTC

Question regarding consistency of Zookeeper ephemeral node cleanup

Hi fellow Zookeepers,

We are using Zookeeper C-client and server v3.5.1-alpha. We are observing
the following sequence of events:-

Client 1 connected to leader.
Client 1 creates ephemeral node /foo/bar
Client 2 connected to follower
Client 1 exits
----20s passes, which is configured as ephemeral node timeout--
Client 2 queries for children of /foo, gets an empty list
Client 2 tries to remove ZNode /foo, gets an error ZNOTEMPTY

It looks like cleanup of the ephemeral node /foo/bar was broadcast to
follower but not committed on the leader. However, when Client 2 queried
for children on the follower, this node was considered as deleted. However
when Client 2 issued a delete for /foo, this operation failed at the leader
due to /foo/bar not being fully cleaned up. Log messages are pasted below:-

*Client 1:*
Aug 18 23:27:56 INFO:   Token 1000005ef complete on server [128.0.0.4:61887],
sessionId=0x4000001ac5700ae, negotiated timeout=200
Aug 19 14:17:15 INFO:   Create
/system/nodes/re0/apps/crd-14104870620583660699/zkid returned 0

*Client 2:*

Aug 18 23:27:56 INFO:   Reconfiguring server at the client 128.0.0.4:61887,
128.0.0.5:61817,localhost:1111

Aug 19 14:18:35 DEBUG:  Remove
/system/nodes/re0/apps/crd-14104870620583660699 returned -111

*ZK Leader:*

2016-08-19 14:18:35,184 [myid:2147483652] - INFO
 [ProcessThread(sid:2147483652 cport:-1)::      PrepRequestProcessor@649] -
Processed session termination for sessionid: 0x4000001ac5700ae

2016-08-19 14:18:35,206 [myid:2147483652] - INFO
 [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@843] - Got
user-level KeeperException when processing sessionid:0x4000001ac57
 0032 type:delete cxid:0x57b650fa zxid:0x100004ea9 txntype:-1 reqpath:n/a
Error Path:/system/nodes/re0/apps/crd-14104870620583660699/zkid
Error:KeeperErrorCode = Directory not empty for /sys
 tem/nodes/re0/apps/crd-14104870620583660699/zkid
Based on my reading of the Zookeeper book(3 ed. p 169), this should not
happen:-

"To guarantee that the order of execution is preserved,
CommitRequestProcessor stalls the processing of pending requests once it
receives a write request. This means that any read requests that were
received after a write request will be blocked until the write request
passes the CommitRequestProcessor."

Can someone explain this to me/correct my understanding?

Thanks and regards,
Shreyas

Re: Question regarding consistency of Zookeeper ephemeral node cleanup

Posted by Patrick Hunt <ph...@apache.org>.
On Mon, Aug 22, 2016 at 5:54 PM, svk <sh...@gmail.com> wrote:

> Hi fellow Zookeepers,
>
> We are using Zookeeper C-client and server v3.5.1-alpha. We are observing
> the following sequence of events:-
>
> Client 1 connected to leader.
> Client 1 creates ephemeral node /foo/bar
> Client 2 connected to follower
> Client 1 exits
> ----20s passes, which is configured as ephemeral node timeout--
>

Do you mean 20 seconds is your session timeout?


> Client 2 queries for children of /foo, gets an empty list
> Client 2 tries to remove ZNode /foo, gets an error ZNOTEMPTY
>
> It looks like cleanup of the ephemeral node /foo/bar was broadcast to
> follower but not committed on the leader. However, when Client 2 queried
> for children on the follower, this node was considered as deleted. However
> when Client 2 issued a delete for /foo, this operation failed at the leader
> due to /foo/bar not being fully cleaned up. Log messages are pasted below:-
>
> *Client 1:*
> Aug 18 23:27:56 INFO:   Token 1000005ef complete on server [
> 128.0.0.4:61887],
> sessionId=0x4000001ac5700ae, negotiated timeout=200
>


what does this mean? "negotiated timeout=200" - your session timeout is 200
millisecond?



> Aug 19 14:17:15 INFO:   Create
> /system/nodes/re0/apps/crd-14104870620583660699/zkid returned 0
>
> *Client 2:*
>
> Aug 18 23:27:56 INFO:   Reconfiguring server at the client 128.0.0.4:61887
> ,
> 128.0.0.5:61817,localhost:1111
>
>
I'm not sure I understand what this means "Reconfiguring server at the
client" - are you running a dynamic reconfig operation? afaict this is not
a ZK message.


> Aug 19 14:18:35 DEBUG:  Remove
> /system/nodes/re0/apps/crd-14104870620583660699 returned -111
>
> *ZK Leader:*
>
> 2016-08-19 14:18:35,184 [myid:2147483652] - INFO
>  [ProcessThread(sid:2147483652 cport:-1)::      PrepRequestProcessor@649]
> -
> Processed session termination for sessionid: 0x4000001ac5700ae
>
> 2016-08-19 14:18:35,206 [myid:2147483652] - INFO
>  [ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@843] - Got
> user-level KeeperException when processing sessionid:0x4000001ac57
>  0032 type:delete cxid:0x57b650fa zxid:0x100004ea9 txntype:-1 reqpath:n/a
> Error Path:/system/nodes/re0/apps/crd-14104870620583660699/zkid
> Error:KeeperErrorCode = Directory not empty for /sys
>  tem/nodes/re0/apps/crd-14104870620583660699/zkid
> Based on my reading of the Zookeeper book(3 ed. p 169), this should not
> happen:-
>
> "To guarantee that the order of execution is preserved,
> CommitRequestProcessor stalls the processing of pending requests once it
> receives a write request. This means that any read requests that were
> received after a write request will be blocked until the write request
> passes the CommitRequestProcessor."
>
>
That's correct from the view of the particular server you are connected to.
A client might be on a follower that is behind the quorum, in which case it
might see:

at time t there exists /foo/bar, at time t+1 bar is removed

1) client get list of /foo children -> slow follower returns a list from
time t (bar still exists)
2) client tries to delete the bar child - this is forwarded to the leader
which is at time t+1, which will fail

However in your case you should not see this.

It looks like (from the leader log above) that only 12ms are elapsing btw
the session termination and the KeeperException. The session term is
triggering the ephemeral deletion if I understand your setup correctly. If
you wait longer does the same error happen, or after some time the clients
are seeing a consistent view, whether attached to the leader or the
follower?

Patrick


> Can someone explain this to me/correct my understanding?
>
> Thanks and regards,
> Shreyas
>