You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by Zhen Zhang <ne...@gmail.com> on 2013/08/07 21:54:29 UTC
Review Request 13386: [HELIX-195] fix race condition between FINALIZE
callbacks and Zk Callbacks (repost rb=13345)
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13386/
-----------------------------------------------------------
Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi Lu.
Repository: helix-git
Description
-------
repost rb=13345
FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController gets a FINALIZE callback, it unsubscribes all listeners via ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all listeners, causing zk watcher leaking.
This is observed by setting up two controllers and expire the leader (by simulating a long gc). The second controller takes the leadership and add all listeners, but when the former leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance listener hence react to all live-instance changes.
We fix this by introducing an expected-notification-type field in CallbackHandler. The field acts like the state for a CallbackHandler. We then defines the possible state transitions for a CallbackHandler. For example, at first CallbackHandler expects an INIT type notification, then it could be either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then we expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, then we expect INIT type. By defining this state machine in CallbackHandler, we can ignore any notifications of CALLBACK type after we already finalize a CallbackHandler. CallbackHandler can only receive notifications of CALLBACK type after it gets re-initialized again.
Diffs
-----
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java a898160
helix-core/src/main/java/org/apache/helix/manager/zk/AbstractManager.java a68e1de
helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java 533650f
helix-core/src/main/java/org/apache/helix/manager/zk/ControllerManager.java fa5100e
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedControllerManager.java 9017132
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java ab1da23
helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java d175d46
helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManagerHelper.java ec6bd79
helix-core/src/test/java/org/apache/helix/TestHelper.java 58dc275
helix-core/src/test/java/org/apache/helix/integration/manager/ClusterDistributedController.java e69de29
helix-core/src/test/java/org/apache/helix/integration/manager/TestConsecutiveZkSessionExpiry.java 0f8a27a
helix-core/src/test/java/org/apache/helix/integration/manager/TestControllerManager.java e69de29
helix-core/src/test/java/org/apache/helix/integration/manager/TestDistributedControllerManager.java 319633b
helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java a0012f2
helix-core/src/test/java/org/apache/helix/integration/manager/TestZkCallbackHandlerLeak.java cf4b9ad
Diff: https://reviews.apache.org/r/13386/diff/
Testing
-------
all tests pass locally
Thanks,
Zhen Zhang
Re: Review Request 13386: [HELIX-195] fix race condition between FINALIZE
callbacks and Zk Callbacks (repost rb=13345)
Posted by Kishore Gopalakrishna <ki...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13386/#review24846
-----------------------------------------------------------
Ship it!
Ship It!
- Kishore Gopalakrishna
On Aug. 7, 2013, 7:54 p.m., Zhen Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13386/
> -----------------------------------------------------------
>
> (Updated Aug. 7, 2013, 7:54 p.m.)
>
>
> Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi Lu.
>
>
> Repository: helix-git
>
>
> Description
> -------
>
> repost rb=13345
>
> FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController gets a FINALIZE callback, it unsubscribes all listeners via ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all listeners, causing zk watcher leaking.
>
> This is observed by setting up two controllers and expire the leader (by simulating a long gc). The second controller takes the leadership and add all listeners, but when the former leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance listener hence react to all live-instance changes.
>
> We fix this by introducing an expected-notification-type field in CallbackHandler. The field acts like the state for a CallbackHandler. We then defines the possible state transitions for a CallbackHandler. For example, at first CallbackHandler expects an INIT type notification, then it could be either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then we expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, then we expect INIT type. By defining this state machine in CallbackHandler, we can ignore any notifications of CALLBACK type after we already finalize a CallbackHandler. CallbackHandler can only receive notifications of CALLBACK type after it gets re-initialized again.
>
>
> Diffs
> -----
>
> helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java a898160
> helix-core/src/main/java/org/apache/helix/manager/zk/AbstractManager.java a68e1de
> helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java 533650f
> helix-core/src/main/java/org/apache/helix/manager/zk/ControllerManager.java fa5100e
> helix-core/src/main/java/org/apache/helix/manager/zk/DistributedControllerManager.java 9017132
> helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java ab1da23
> helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java d175d46
> helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManagerHelper.java ec6bd79
> helix-core/src/test/java/org/apache/helix/TestHelper.java 58dc275
> helix-core/src/test/java/org/apache/helix/integration/manager/ClusterDistributedController.java e69de29
> helix-core/src/test/java/org/apache/helix/integration/manager/TestConsecutiveZkSessionExpiry.java 0f8a27a
> helix-core/src/test/java/org/apache/helix/integration/manager/TestControllerManager.java e69de29
> helix-core/src/test/java/org/apache/helix/integration/manager/TestDistributedControllerManager.java 319633b
> helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java a0012f2
> helix-core/src/test/java/org/apache/helix/integration/manager/TestZkCallbackHandlerLeak.java cf4b9ad
>
> Diff: https://reviews.apache.org/r/13386/diff/
>
>
> Testing
> -------
>
> all tests pass locally
>
>
> Thanks,
>
> Zhen Zhang
>
>
Re: Review Request 13386: [HELIX-195] fix race condition between FINALIZE
callbacks and Zk Callbacks (repost rb=13345)
Posted by Kishore Gopalakrishna <ki...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13386/#review24821
-----------------------------------------------------------
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java
<https://reviews.apache.org/r/13386/#comment48973>
typo reqlinquish
helix-core/src/test/java/org/apache/helix/TestHelper.java
<https://reviews.apache.org/r/13386/#comment48998>
replace sysout with logger.
- Kishore Gopalakrishna
On Aug. 7, 2013, 7:54 p.m., Zhen Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13386/
> -----------------------------------------------------------
>
> (Updated Aug. 7, 2013, 7:54 p.m.)
>
>
> Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi Lu.
>
>
> Repository: helix-git
>
>
> Description
> -------
>
> repost rb=13345
>
> FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController gets a FINALIZE callback, it unsubscribes all listeners via ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all listeners, causing zk watcher leaking.
>
> This is observed by setting up two controllers and expire the leader (by simulating a long gc). The second controller takes the leadership and add all listeners, but when the former leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance listener hence react to all live-instance changes.
>
> We fix this by introducing an expected-notification-type field in CallbackHandler. The field acts like the state for a CallbackHandler. We then defines the possible state transitions for a CallbackHandler. For example, at first CallbackHandler expects an INIT type notification, then it could be either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then we expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, then we expect INIT type. By defining this state machine in CallbackHandler, we can ignore any notifications of CALLBACK type after we already finalize a CallbackHandler. CallbackHandler can only receive notifications of CALLBACK type after it gets re-initialized again.
>
>
> Diffs
> -----
>
> helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java a898160
> helix-core/src/main/java/org/apache/helix/manager/zk/AbstractManager.java a68e1de
> helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java 533650f
> helix-core/src/main/java/org/apache/helix/manager/zk/ControllerManager.java fa5100e
> helix-core/src/main/java/org/apache/helix/manager/zk/DistributedControllerManager.java 9017132
> helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java ab1da23
> helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java d175d46
> helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManagerHelper.java ec6bd79
> helix-core/src/test/java/org/apache/helix/TestHelper.java 58dc275
> helix-core/src/test/java/org/apache/helix/integration/manager/ClusterDistributedController.java e69de29
> helix-core/src/test/java/org/apache/helix/integration/manager/TestConsecutiveZkSessionExpiry.java 0f8a27a
> helix-core/src/test/java/org/apache/helix/integration/manager/TestControllerManager.java e69de29
> helix-core/src/test/java/org/apache/helix/integration/manager/TestDistributedControllerManager.java 319633b
> helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java a0012f2
> helix-core/src/test/java/org/apache/helix/integration/manager/TestZkCallbackHandlerLeak.java cf4b9ad
>
> Diff: https://reviews.apache.org/r/13386/diff/
>
>
> Testing
> -------
>
> all tests pass locally
>
>
> Thanks,
>
> Zhen Zhang
>
>