You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Edward Ribeiro (JIRA)" <ji...@apache.org> on 2016/09/16 15:36:20 UTC
[jira] [Comment Edited] (ZOOKEEPER-2439) The order of asynchronous setACL is not correct.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15496588#comment-15496588 ] 

Edward Ribeiro edited comment on ZOOKEEPER-2439 at 9/16/16 3:35 PM:
--------------------------------------------------------------------

Hi [~kazuakibanzai],

I took some time to dig this problem and I *guess* I've found the root cause. *Disclaimer: I can be totally wrong on my assumptions, so would love to hear from some ZK committers as to deny or confirm my idea (and took the liberty of marking them on this message, sorry guys).* /cc [~fpj], [~breed], [~phunt]

First and foremost, *huge thanks* for the patch to simulate the bug, it helped a lot. Well, let's start: in a nutshell, the server side ZK ordered processing pipeline of RequestProcessors. The one that is interesting to this particular case looks like this (I am a newbie, so I may get something wrong):

{code}
request ---------->  PrepRequestProcessor --------> SyncRequestsProcessor  ----------> FinalRequestProcessor ------> [request applied and committed]
{code}

*KEEP THIS IN MIND:* A request is committed to ZK database as well as having the logs and snapshots updated if it arrives at the {{FinalRequestProcessor}} without any exception.

a. One request can be currently being processed at {{PrepRequestProcessor}} while another one is either {{SyncRequestsProcessor}} or {{FinalRequestsProcessor}}. In fact, I guess that we can have three requests, one in each step of the pipeline under high load, for example.

b. At {{PrepRequestProcessor}} ZK checks if the znode path of the request is well formed as well as authorization and ACL permissions (it makes sense as SyncRequestProcessor, among other things, saves the log file and we wouldn't want to record an invalid request on command log, right?). *KEEP THIS IN MIND TOO:* ACL checks are done at the first stage of the pipeline.


*_So, the problem you described can be summarised as such:_*

1. a setACL request that removes access to '/' (let's call it REQ1) is sent asynchronously, followed by a create a znode under '/a' request (let's call it REQ2). Both are enqueued in order and arrive in this order on the server side.

2. REQ1 passes {{PrepRequestProcessor}}.

3. While REQ1 is at {{SyncRequestsProcessor}}, REQ 2 arrives at {{PrepRequestProcessor}}. {{PrepRequestProcessor}} checks ZK DB and see that REQ2 has the ACL rights to create the node, because REQ1 was not committed to ZK DB yet (it's in the middle of the pipeline).

4. When REQ1 is finally committed on ZK DB, REQ1 is just behind it in the pipeline, so it also gets committed to the ZK DB even tough REQ1 would prevent REQ2 from being applied! Remember the permission was check on {{PrepRequestProcessor}}, the first stage of the pipeline, when REQ1 was still ongoing.

*_Now let's see why some scenarios that explain your unit test:_*

Case 1: The sync ACL call is, well, synchronous so, it wait for a response before sending the next command (createNode). Therefore, REQ1 completes the pipeline before REQ2 even reaches the it and the createNode is rejected as expected.

Case 2. If you put a sleep between async setACL and createNode then you gave some time to REQ1 finish the pipeline because REQ2 will be inserted into the client outgoing queue after the sleep time. Again, this succeeds by rejecting the createNode command.

Case 3. If you insert async or sync setACL commands before the createNode you are "stuffing" additional commands between setACL and createNode. Again giving a chance of setACL finishes before the createdNode reaches the first stage of the pipeline. Then again, it works as expected (the createNode is rejected).

Finally, another conjecture I have is that this setACL/createNode behavior could happen if with synchronous setACL. If we had two clients, the first one could issue a sync setACL and the second one a createNode, and with the right timing they would be enqueued as explained above.

Does it make sense?






was (Author: eribeiro):
Hi [~kazuakibanzai],

I took some time to dig this problem and I *guess* I've found the root cause. *Disclaimer: I can be totally wrong on my assumptions, so would love to see some ZK committers about this as to deny or confirm my idea (and took the liberty of marking them on this message, sorry guys).* /cc [~fpj], [~breed], [~phunt]

First and foremost, *huge thanks* for the patch to simulate the bug, it helped a lot. Well, let's start: in a nutshell, the server side ZK ordered processing pipeline of RequestProcessors. The one that is interesting to this particular case looks like this (I am a newbie, so I may get something wrong):

{code}
request ---------->  PrepRequestProcessor --------> SyncRequestsProcessor  ----------> FinalRequestProcessor ------> [request applied and committed]
{code}

*KEEP THIS IN MIND:* A request is committed to ZK database as well as having the logs and snapshots updated if it arrives at the {{FinalRequestProcessor}} without any exception.

a. One request can be currently being processed at {{PrepRequestProcessor}} while another one is either {{SyncRequestsProcessor}} or {{FinalRequestsProcessor}}. In fact, I guess that we can have three requests, one in each step of the pipeline under high load, for example.

b. At {{PrepRequestProcessor}} ZK checks if the znode path of the request is well formed as well as authorization and ACL permissions (it makes sense as SyncRequestProcessor, among other things, saves the log file and we wouldn't want to record an invalid request on command log, right?). *KEEP THIS IN MIND TOO:* ACL checks are done at the first stage of the pipeline.


*_So, the problem you described can be summarised as such:_*

1. a setACL request that removes access to '/' (let's call it REQ1) is sent asynchronously, followed by a create a znode under '/a' request (let's call it REQ2). Both are enqueued in order and arrive in this order on the server side.

2. REQ1 passes {{PrepRequestProcessor}}.

3. While REQ1 is at {{SyncRequestsProcessor}}, REQ 2 arrives at {{PrepRequestProcessor}}. {{PrepRequestProcessor}} checks ZK DB and see that REQ2 has the ACL rights to create the node, because REQ1 was not committed to ZK DB yet (it's in the middle of the pipeline).

4. When REQ1 is finally committed on ZK DB, REQ1 is just behind it in the pipeline, so it also gets committed to the ZK DB even tough REQ1 would prevent REQ2 from being applied! Remember the permission was check on {{PrepRequestProcessor}}, the first stage of the pipeline, when REQ1 was still ongoing.

*_Now let's see why some scenarios that explain your unit test:_*

Case 1: The sync ACL call is, well, synchronous so, it wait for a response before sending the next command (createNode). Therefore, REQ1 completes the pipeline before REQ2 even reaches the it and the createNode is rejected as expected.

Case 2. If you put a sleep between async setACL and createNode then you gave some time to REQ1 finish the pipeline because REQ2 will be inserted into the client outgoing queue after the sleep time. Again, this succeeds by rejecting the createNode command.

Case 3. If you insert async or sync setACL commands before the createNode you are "stuffing" additional commands between setACL and createNode. Again giving a chance of setACL finishes before the createdNode reaches the first stage of the pipeline. Then again, it works as expected (the createNode is rejected).

Finally, another conjecture I have is that this setACL/createNode behavior could happen if with synchronous setACL. If we had two clients, the first one could issue a sync setACL and the second one a createNode, and with the right timing they would be enqueued as explained above.

Does it make sense?





> The order of asynchronous setACL is not correct.
> ------------------------------------------------
>
>                 Key: ZOOKEEPER-2439
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2439
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.8, 3.5.1
>         Environment: Linux Ubuntu
> Mac OS X
>            Reporter: Kazuaki Banzai
>              Labels: acl
>         Attachments: ZOOKEEPER-2439.patch
>
>
> Within a given client connection, the execution of commands on the ZooKeeper server is always ordered, as both synchronous and asynchronous commands are dispatched through queuePacket (directly or indirectly).
> In other words, Zookeeper guarantees sequential consistency: updates from a client will be applied in the order that they were sent.
> However, the order of asynchronous setACL is not correct on Ubuntu.
> When asynchronous setACL is called BEFORE another API is called, asynchronous setACL is applied AFTER another API.
> For example, if a client calls
> (1) asynchronous setACL to remove all permissions of node "/" and
> (2) synchronous create to create node "/a",
> synchronous create should fail, but it succeeds on Ubuntu.
> (We can see all permissions of node "/" are removed when the client calls getACL to node "/" after (2), so (1) is applied AFTER (2). If we call getACL between (1) and (2), the synchronous case works correctly but the asynchronous case still produces the bug.)
> The attached unit test reproduces this scenario. It fails on Linux Ubuntu but succeeds on Mac OS X. If used on a heavily loaded server on Mac OS, the test sometimes fails as well but only rarely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)