You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Mohammad arshad <mo...@huawei.com> on 2016/01/21 14:24:48 UTC

How to handle zookeeper data inconsistency

Hi All,
I came across a scenario where zookeeper was left in inconsistent state(but that is valid as per the zookeeper theory) and because of that dependent application's behaved wrongly
The scenario is as follow

1) I have three server Zookeeper cluster, let's say servers are A, B and C. B is the leader
2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
3) Now for application, which is connected to C, ZooKeeper.exists returns the znod1 and that is why application enters into node exists flow which is wrong

shall I check exists from leader only? but even leader can have some node undeleted in the above scenario
Any guideline to handle the above said valid data inconsistency ??

Any suggestion/help is highly appreciated.

Best Regards
Mohammad Arshad
HUAWEI TECHNOLOGIES CO.LTD.
Huawei Tecnologies India Pvt. Ltd.
Near EPIP Industrial Area, Kundalahalli Village
Whitefield, Bangalore-560066
www.huawei.com<http://www.huawei.com/>
-----------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

RE: How to handle zookeeper data inconsistency

Posted by Mohammad arshad <mo...@huawei.com>.

Thanks Flavio Junqueira for your response.
> If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed.
Some scenarios have been found where if follower failed while synching with leader then those failed transactions are never synced. 
I will soon update in ZOOKEEPER-2355

Best Regards
Mohammad Arshad
HUAWEI TECHNOLOGIES CO.LTD.    
Huawei Tecnologies India Pvt. Ltd.
Near EPIP Industrial Area, Kundalahalli Village
Whitefield, Bangalore-560066
www.huawei.com
-----------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any use of the 
information contained herein in any way (including, but not limited to, total or partial 
disclosure, reproduction, or dissemination) by persons other than the intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by 
phone or email immediately and delete it!


-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: 21 January 2016 21:09
To: user@zookeeper.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: How to handle zookeeper data inconsistency

I should have added that you can use sync() followed by exists() if you want to catch old deletes that are still propagating. This is supposed to flush the pending updates from the leader.

-Flavio

> On 21 Jan 2016, at 15:34, Flavio Junqueira <fp...@apache.org> wrote:
> 
> 
>> On 21 Jan 2016, at 15:03, Mohammad arshad <mo...@huawei.com> wrote:
>> 
>> Thanks for Flavio Junqueira for your response.
>> 
>> assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.
> 
> If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed. This is part of the recovery process. Even though there are multiple steps, you can assume that once C is back online, it will have reflected in its state all committed have previously committed, excepted for the ones that are still in-flight.
> 
>> 
>> Here is another scenario we faced.
>> Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 
>> 
> 
> If B successfully committed the delete operation, even if the commit message didn't go out, then it means that at least another node got the proposal. In your 3-server ensemble, a quorum has size 2, so any proposal needs to be persisted and acknowledged by a quorum before it is committed.
> 
>> Now one of the A and C becomes leader but it has inconsistent data. ( 
>> delete is not executed here)
>> 
> 
> It will be executed there because the new leader, A or C, needs to commit the initial state of the new epoch and it will do it based on its log state, which will include the delete operation.
> 
>> As I know, This behavior is fine as per current ZK design. But to 
>> solve above data inconsistency issue, any suggestions ? I thought to 
>> commit the delete not only in leader but to at least in N/2 nodes in 
>> the same client call and then only mark delete as successful
> 
> No, not fine. If a quorum has acknowledged a txn, then we guarantee that the corresponding operation is durable. The thing that is ok as per ZK design is that the delete operation is acknowledged, and a particular server, say C, only receives it a little later. In this case, it could happen that a client reads the ZK state but misses the delete. However, if the client keeps reading, then it should eventually see the delete.
> 
> Another thing that is fine is that if no quorum acknowledges a txn, then the txn isn't durable. 
> 
> -Flavio
> 
>> 
>> -----Original Message-----
>> From: Flavio Junqueira [mailto:fpj@apache.org]
>> Sent: 21 January 2016 19:11
>> To: user@zookeeper.apache.org
>> Cc: dev
>> Subject: Re: How to handle zookeeper data inconsistency
>> 
>> Hi Mohammad,
>> 
>> A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.
>> 
>> If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.
>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario
>> 
>> There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.
>> 
>> I'd also search for jiras, especially if you're deleting an ephemeral. 
>> 
>> -Flavio
>> 
>>> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
>>> 
>>> Hi All,
>>> I came across a scenario where zookeeper was left in inconsistent 
>>> state(but that is valid as per the zookeeper theory) and because of 
>>> that dependent application's behaved wrongly The scenario is as 
>>> follow
>>> 
>>> 1) I have three server Zookeeper cluster, let's say servers are A, B 
>>> and C. B is the leader
>>> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
>>> 3) Now for application, which is connected to C, ZooKeeper.exists 
>>> returns the znod1 and that is why application enters into node 
>>> exists flow which is wrong
>>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
>>> 
>>> Any suggestion/help is highly appreciated.
>>> 
>>> Best Regards
>>> Mohammad Arshad
>>> HUAWEI TECHNOLOGIES CO.LTD.
>>> Huawei Tecnologies India Pvt. Ltd.
>>> Near EPIP Industrial Area, Kundalahalli Village Whitefield,
>>> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
>>> --------------------------------------------------------------------
>>> --
>>> -------------------------------------------
>>> This e-mail and its attachments contain confidential information 
>>> from HUAWEI, which is intended only for the person or entity whose 
>>> address is listed above. Any use of the information contained herein 
>>> in any way (including, but not limited to, total or partial 
>>> disclosure, reproduction, or dissemination) by persons other than 
>>> the intended
>>> recipient(s) is prohibited. If you receive this e-mail in error, 
>>> please notify the sender by phone or email immediately and delete it!
>>> 
>> 
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

I should have added that you can use sync() followed by exists() if you want to catch old deletes that are still propagating. This is supposed to flush the pending updates from the leader.

-Flavio

> On 21 Jan 2016, at 15:34, Flavio Junqueira <fp...@apache.org> wrote:
> 
> 
>> On 21 Jan 2016, at 15:03, Mohammad arshad <mo...@huawei.com> wrote:
>> 
>> Thanks for Flavio Junqueira for your response.
>> 
>> assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.
> 
> If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed. This is part of the recovery process. Even though there are multiple steps, you can assume that once C is back online, it will have reflected in its state all committed have previously committed, excepted for the ones that are still in-flight.
> 
>> 
>> Here is another scenario we faced.
>> Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 
>> 
> 
> If B successfully committed the delete operation, even if the commit message didn't go out, then it means that at least another node got the proposal. In your 3-server ensemble, a quorum has size 2, so any proposal needs to be persisted and acknowledged by a quorum before it is committed.
> 
>> Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)
>> 
> 
> It will be executed there because the new leader, A or C, needs to commit the initial state of the new epoch and it will do it based on its log state, which will include the delete operation.
> 
>> As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful
> 
> No, not fine. If a quorum has acknowledged a txn, then we guarantee that the corresponding operation is durable. The thing that is ok as per ZK design is that the delete operation is acknowledged, and a particular server, say C, only receives it a little later. In this case, it could happen that a client reads the ZK state but misses the delete. However, if the client keeps reading, then it should eventually see the delete.
> 
> Another thing that is fine is that if no quorum acknowledges a txn, then the txn isn't durable. 
> 
> -Flavio
> 
>> 
>> -----Original Message-----
>> From: Flavio Junqueira [mailto:fpj@apache.org] 
>> Sent: 21 January 2016 19:11
>> To: user@zookeeper.apache.org
>> Cc: dev
>> Subject: Re: How to handle zookeeper data inconsistency
>> 
>> Hi Mohammad,
>> 
>> A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.
>> 
>> If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.
>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario
>> 
>> There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.
>> 
>> I'd also search for jiras, especially if you're deleting an ephemeral. 
>> 
>> -Flavio
>> 
>>> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
>>> 
>>> Hi All,
>>> I came across a scenario where zookeeper was left in inconsistent 
>>> state(but that is valid as per the zookeeper theory) and because of 
>>> that dependent application's behaved wrongly The scenario is as follow
>>> 
>>> 1) I have three server Zookeeper cluster, let's say servers are A, B 
>>> and C. B is the leader
>>> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
>>> 3) Now for application, which is connected to C, ZooKeeper.exists  
>>> returns the znod1 and that is why application enters into node exists 
>>> flow which is wrong
>>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
>>> 
>>> Any suggestion/help is highly appreciated.
>>> 
>>> Best Regards
>>> Mohammad Arshad
>>> HUAWEI TECHNOLOGIES CO.LTD.
>>> Huawei Tecnologies India Pvt. Ltd.
>>> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
>>> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
>>> ----------------------------------------------------------------------
>>> -------------------------------------------
>>> This e-mail and its attachments contain confidential information from 
>>> HUAWEI, which is intended only for the person or entity whose address 
>>> is listed above. Any use of the information contained herein in any 
>>> way (including, but not limited to, total or partial disclosure, 
>>> reproduction, or dissemination) by persons other than the intended
>>> recipient(s) is prohibited. If you receive this e-mail in error, 
>>> please notify the sender by phone or email immediately and delete it!
>>> 
>> 
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

I should have added that you can use sync() followed by exists() if you want to catch old deletes that are still propagating. This is supposed to flush the pending updates from the leader.

-Flavio

> On 21 Jan 2016, at 15:34, Flavio Junqueira <fp...@apache.org> wrote:
> 
> 
>> On 21 Jan 2016, at 15:03, Mohammad arshad <mo...@huawei.com> wrote:
>> 
>> Thanks for Flavio Junqueira for your response.
>> 
>> assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.
> 
> If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed. This is part of the recovery process. Even though there are multiple steps, you can assume that once C is back online, it will have reflected in its state all committed have previously committed, excepted for the ones that are still in-flight.
> 
>> 
>> Here is another scenario we faced.
>> Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 
>> 
> 
> If B successfully committed the delete operation, even if the commit message didn't go out, then it means that at least another node got the proposal. In your 3-server ensemble, a quorum has size 2, so any proposal needs to be persisted and acknowledged by a quorum before it is committed.
> 
>> Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)
>> 
> 
> It will be executed there because the new leader, A or C, needs to commit the initial state of the new epoch and it will do it based on its log state, which will include the delete operation.
> 
>> As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful
> 
> No, not fine. If a quorum has acknowledged a txn, then we guarantee that the corresponding operation is durable. The thing that is ok as per ZK design is that the delete operation is acknowledged, and a particular server, say C, only receives it a little later. In this case, it could happen that a client reads the ZK state but misses the delete. However, if the client keeps reading, then it should eventually see the delete.
> 
> Another thing that is fine is that if no quorum acknowledges a txn, then the txn isn't durable. 
> 
> -Flavio
> 
>> 
>> -----Original Message-----
>> From: Flavio Junqueira [mailto:fpj@apache.org] 
>> Sent: 21 January 2016 19:11
>> To: user@zookeeper.apache.org
>> Cc: dev
>> Subject: Re: How to handle zookeeper data inconsistency
>> 
>> Hi Mohammad,
>> 
>> A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.
>> 
>> If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.
>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario
>> 
>> There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.
>> 
>> I'd also search for jiras, especially if you're deleting an ephemeral. 
>> 
>> -Flavio
>> 
>>> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
>>> 
>>> Hi All,
>>> I came across a scenario where zookeeper was left in inconsistent 
>>> state(but that is valid as per the zookeeper theory) and because of 
>>> that dependent application's behaved wrongly The scenario is as follow
>>> 
>>> 1) I have three server Zookeeper cluster, let's say servers are A, B 
>>> and C. B is the leader
>>> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
>>> 3) Now for application, which is connected to C, ZooKeeper.exists  
>>> returns the znod1 and that is why application enters into node exists 
>>> flow which is wrong
>>> 
>>> shall I check exists from leader only? but even leader can have some 
>>> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
>>> 
>>> Any suggestion/help is highly appreciated.
>>> 
>>> Best Regards
>>> Mohammad Arshad
>>> HUAWEI TECHNOLOGIES CO.LTD.
>>> Huawei Tecnologies India Pvt. Ltd.
>>> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
>>> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
>>> ----------------------------------------------------------------------
>>> -------------------------------------------
>>> This e-mail and its attachments contain confidential information from 
>>> HUAWEI, which is intended only for the person or entity whose address 
>>> is listed above. Any use of the information contained herein in any 
>>> way (including, but not limited to, total or partial disclosure, 
>>> reproduction, or dissemination) by persons other than the intended
>>> recipient(s) is prohibited. If you receive this e-mail in error, 
>>> please notify the sender by phone or email immediately and delete it!
>>> 
>> 
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

> On 21 Jan 2016, at 15:03, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Thanks for Flavio Junqueira for your response.
> 
> assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.

If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed. This is part of the recovery process. Even though there are multiple steps, you can assume that once C is back online, it will have reflected in its state all committed have previously committed, excepted for the ones that are still in-flight.

> 
> Here is another scenario we faced.
> Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 
> 

If B successfully committed the delete operation, even if the commit message didn't go out, then it means that at least another node got the proposal. In your 3-server ensemble, a quorum has size 2, so any proposal needs to be persisted and acknowledged by a quorum before it is committed.

> Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)
> 

It will be executed there because the new leader, A or C, needs to commit the initial state of the new epoch and it will do it based on its log state, which will include the delete operation.

> As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful

No, not fine. If a quorum has acknowledged a txn, then we guarantee that the corresponding operation is durable. The thing that is ok as per ZK design is that the delete operation is acknowledged, and a particular server, say C, only receives it a little later. In this case, it could happen that a client reads the ZK state but misses the delete. However, if the client keeps reading, then it should eventually see the delete.

Another thing that is fine is that if no quorum acknowledges a txn, then the txn isn't durable. 

-Flavio
 
> 
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@apache.org] 
> Sent: 21 January 2016 19:11
> To: user@zookeeper.apache.org
> Cc: dev
> Subject: Re: How to handle zookeeper data inconsistency
> 
> Hi Mohammad,
> 
> A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.
> 
> If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.
> 
>> shall I check exists from leader only? but even leader can have some 
>> node undeleted in the above scenario
> 
> There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.
> 
> I'd also search for jiras, especially if you're deleting an ephemeral. 
> 
> -Flavio
> 
>> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
>> 
>> Hi All,
>> I came across a scenario where zookeeper was left in inconsistent 
>> state(but that is valid as per the zookeeper theory) and because of 
>> that dependent application's behaved wrongly The scenario is as follow
>> 
>> 1) I have three server Zookeeper cluster, let's say servers are A, B 
>> and C. B is the leader
>> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
>> 3) Now for application, which is connected to C, ZooKeeper.exists  
>> returns the znod1 and that is why application enters into node exists 
>> flow which is wrong
>> 
>> shall I check exists from leader only? but even leader can have some 
>> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
>> 
>> Any suggestion/help is highly appreciated.
>> 
>> Best Regards
>> Mohammad Arshad
>> HUAWEI TECHNOLOGIES CO.LTD.
>> Huawei Tecnologies India Pvt. Ltd.
>> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
>> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
>> ----------------------------------------------------------------------
>> -------------------------------------------
>> This e-mail and its attachments contain confidential information from 
>> HUAWEI, which is intended only for the person or entity whose address 
>> is listed above. Any use of the information contained herein in any 
>> way (including, but not limited to, total or partial disclosure, 
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender by phone or email immediately and delete it!
>> 
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

> On 21 Jan 2016, at 15:03, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Thanks for Flavio Junqueira for your response.
> 
> assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.

If C failed, then when it comes back up, it will sync up with the leader and learn everything that has been committed. This is part of the recovery process. Even though there are multiple steps, you can assume that once C is back online, it will have reflected in its state all committed have previously committed, excepted for the ones that are still in-flight.

> 
> Here is another scenario we faced.
> Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 
> 

If B successfully committed the delete operation, even if the commit message didn't go out, then it means that at least another node got the proposal. In your 3-server ensemble, a quorum has size 2, so any proposal needs to be persisted and acknowledged by a quorum before it is committed.

> Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)
> 

It will be executed there because the new leader, A or C, needs to commit the initial state of the new epoch and it will do it based on its log state, which will include the delete operation.

> As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful

No, not fine. If a quorum has acknowledged a txn, then we guarantee that the corresponding operation is durable. The thing that is ok as per ZK design is that the delete operation is acknowledged, and a particular server, say C, only receives it a little later. In this case, it could happen that a client reads the ZK state but misses the delete. However, if the client keeps reading, then it should eventually see the delete.

Another thing that is fine is that if no quorum acknowledges a txn, then the txn isn't durable. 

-Flavio
 
> 
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@apache.org] 
> Sent: 21 January 2016 19:11
> To: user@zookeeper.apache.org
> Cc: dev
> Subject: Re: How to handle zookeeper data inconsistency
> 
> Hi Mohammad,
> 
> A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.
> 
> If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.
> 
>> shall I check exists from leader only? but even leader can have some 
>> node undeleted in the above scenario
> 
> There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.
> 
> I'd also search for jiras, especially if you're deleting an ephemeral. 
> 
> -Flavio
> 
>> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
>> 
>> Hi All,
>> I came across a scenario where zookeeper was left in inconsistent 
>> state(but that is valid as per the zookeeper theory) and because of 
>> that dependent application's behaved wrongly The scenario is as follow
>> 
>> 1) I have three server Zookeeper cluster, let's say servers are A, B 
>> and C. B is the leader
>> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
>> 3) Now for application, which is connected to C, ZooKeeper.exists  
>> returns the znod1 and that is why application enters into node exists 
>> flow which is wrong
>> 
>> shall I check exists from leader only? but even leader can have some 
>> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
>> 
>> Any suggestion/help is highly appreciated.
>> 
>> Best Regards
>> Mohammad Arshad
>> HUAWEI TECHNOLOGIES CO.LTD.
>> Huawei Tecnologies India Pvt. Ltd.
>> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
>> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
>> ----------------------------------------------------------------------
>> -------------------------------------------
>> This e-mail and its attachments contain confidential information from 
>> HUAWEI, which is intended only for the person or entity whose address 
>> is listed above. Any use of the information contained herein in any 
>> way (including, but not limited to, total or partial disclosure, 
>> reproduction, or dissemination) by persons other than the intended
>> recipient(s) is prohibited. If you receive this e-mail in error, 
>> please notify the sender by phone or email immediately and delete it!
>> 
>

RE: How to handle zookeeper data inconsistency

Posted by Mohammad arshad <mo...@huawei.com>.

Thanks for Flavio Junqueira for your response.

assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.

Here is another scenario we faced.
Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 

Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)

As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful

Any suggestions ?

Best Regards
Mohammad Arshad

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: 21 January 2016 19:11
To: user@zookeeper.apache.org
Cc: dev
Subject: Re: How to handle zookeeper data inconsistency

Hi Mohammad,

A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.

If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.

> shall I check exists from leader only? but even leader can have some 
> node undeleted in the above scenario

There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.

I'd also search for jiras, especially if you're deleting an ephemeral. 

-Flavio

> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Hi All,
> I came across a scenario where zookeeper was left in inconsistent 
> state(but that is valid as per the zookeeper theory) and because of 
> that dependent application's behaved wrongly The scenario is as follow
> 
> 1) I have three server Zookeeper cluster, let's say servers are A, B 
> and C. B is the leader
> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
> 3) Now for application, which is connected to C, ZooKeeper.exists  
> returns the znod1 and that is why application enters into node exists 
> flow which is wrong
> 
> shall I check exists from leader only? but even leader can have some 
> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
> 
> Any suggestion/help is highly appreciated.
> 
> Best Regards
> Mohammad Arshad
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
> ----------------------------------------------------------------------
> -------------------------------------------
> This e-mail and its attachments contain confidential information from 
> HUAWEI, which is intended only for the person or entity whose address 
> is listed above. Any use of the information contained herein in any 
> way (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
>

RE: How to handle zookeeper data inconsistency

Posted by Mohammad arshad <mo...@huawei.com>.

Thanks for Flavio Junqueira for your response.

assume C received the commit request but before committing it failed, When C will be synced? What event will at leader or follower will synch it up.

Here is another scenario we faced.
Node got deleted successfully in leader node B. But due to network issue in Master node, the delete could not sync up to follower A and C.  At this moment, Leader node also goes down as faulty. 

Now one of the A and C becomes leader but it has inconsistent data. ( delete is not executed here)

As I know, This behavior is fine as per current ZK design. But to solve above data inconsistency issue, any suggestions ? I thought to commit the delete not only in leader but to at least in N/2 nodes in the same client call and then only mark delete as successful

Any suggestions ?

Best Regards
Mohammad Arshad

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: 21 January 2016 19:11
To: user@zookeeper.apache.org
Cc: dev
Subject: Re: How to handle zookeeper data inconsistency

Hi Mohammad,

A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.

If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.

> shall I check exists from leader only? but even leader can have some 
> node undeleted in the above scenario

There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.

I'd also search for jiras, especially if you're deleting an ephemeral. 

-Flavio

> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Hi All,
> I came across a scenario where zookeeper was left in inconsistent 
> state(but that is valid as per the zookeeper theory) and because of 
> that dependent application's behaved wrongly The scenario is as follow
> 
> 1) I have three server Zookeeper cluster, let's say servers are A, B 
> and C. B is the leader
> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
> 3) Now for application, which is connected to C, ZooKeeper.exists  
> returns the znod1 and that is why application enters into node exists 
> flow which is wrong
> 
> shall I check exists from leader only? but even leader can have some 
> node undeleted in the above scenario Any guideline to handle the above said valid data inconsistency ??
> 
> Any suggestion/help is highly appreciated.
> 
> Best Regards
> Mohammad Arshad
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village Whitefield, 
> Bangalore-560066 www.huawei.com<http://www.huawei.com/>
> ----------------------------------------------------------------------
> -------------------------------------------
> This e-mail and its attachments contain confidential information from 
> HUAWEI, which is intended only for the person or entity whose address 
> is listed above. Any use of the information contained herein in any 
> way (including, but not limited to, total or partial disclosure, 
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, 
> please notify the sender by phone or email immediately and delete it!
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

Hi Mohammad,

A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.

If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.

> shall I check exists from leader only? but even leader can have some node undeleted in the above scenario

There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.

I'd also search for jiras, especially if you're deleting an ephemeral. 

-Flavio

> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Hi All,
> I came across a scenario where zookeeper was left in inconsistent state(but that is valid as per the zookeeper theory) and because of that dependent application's behaved wrongly
> The scenario is as follow
> 
> 1) I have three server Zookeeper cluster, let's say servers are A, B and C. B is the leader
> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
> 3) Now for application, which is connected to C, ZooKeeper.exists  returns the znod1 and that is why application enters into node exists flow which is wrong
> 
> shall I check exists from leader only? but even leader can have some node undeleted in the above scenario
> Any guideline to handle the above said valid data inconsistency ??
> 
> Any suggestion/help is highly appreciated.
> 
> Best Regards
> Mohammad Arshad
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village
> Whitefield, Bangalore-560066
> www.huawei.com<http://www.huawei.com/>
> -----------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from HUAWEI, which
> is intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!
>

Re: How to handle zookeeper data inconsistency

Posted by Flavio Junqueira <fp...@apache.org>.

Hi Mohammad,

A delete operation only needs to reach a quorum to complete and A B form a quorum in your 3-server ensemble. If the delete operation never gets propagated to C and other write operations that have been ordered later complete on C, then you have an issue. If C simply stops receiving updates, then you have a problem with your C server and it could be a problem with ZK or just the environment.

If there has been write operations ordered after the delete and server C has seen those but not the delete, then I'd recommend that you have a look the txn logs with the log formatter.

> shall I check exists from leader only? but even leader can have some node undeleted in the above scenario

There is no such a requirement, but you need to be aware that server C could definitely make an update visible later compared to other servers. ZooKeeper doesn't guarantee that updates are visible to all clients as soon as they are acknowledged.

I'd also search for jiras, especially if you're deleting an ephemeral. 

-Flavio

> On 21 Jan 2016, at 13:24, Mohammad arshad <mo...@huawei.com> wrote:
> 
> Hi All,
> I came across a scenario where zookeeper was left in inconsistent state(but that is valid as per the zookeeper theory) and because of that dependent application's behaved wrongly
> The scenario is as follow
> 
> 1) I have three server Zookeeper cluster, let's say servers are A, B and C. B is the leader
> 2) In one successful delete operation, a znode znode1 was deleted from A and B but somehow not deleted from C. The reason for not deleted from C can be either proposal or commit failed.
> 3) Now for application, which is connected to C, ZooKeeper.exists  returns the znod1 and that is why application enters into node exists flow which is wrong
> 
> shall I check exists from leader only? but even leader can have some node undeleted in the above scenario
> Any guideline to handle the above said valid data inconsistency ??
> 
> Any suggestion/help is highly appreciated.
> 
> Best Regards
> Mohammad Arshad
> HUAWEI TECHNOLOGIES CO.LTD.
> Huawei Tecnologies India Pvt. Ltd.
> Near EPIP Industrial Area, Kundalahalli Village
> Whitefield, Bangalore-560066
> www.huawei.com<http://www.huawei.com/>
> -----------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from HUAWEI, which
> is intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!
>