You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by jiajunwang <gi...@git.apache.org> on 2019/01/12 01:15:35 UTC

[GitHub] helix pull request #297: ZkClient related improvments

GitHub user jiajunwang opened a pull request:

    https://github.com/apache/helix/pull/297

    ZkClient related improvments

    We identify 2 potential issues that may cause a retrying ZK operation failed unexpectedly. These commits fix the problem.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiajunwang/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/297.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #297
    
----
commit 4c4891197103bd3fe4660fdeca40b537de649b97
Author: Jiajun Wang <jj...@...>
Date:   2019-01-08T23:27:21Z

    Add ZkConnection.reconnect to avoid NPE when reset ZkConnection.
    
    In the old version, reconnect was done by closing and then connecting. In between, the zookeeper ref is null. This may cause NPE which terminate ZkClient operation retry earlier than expected.
    This change copy the existing ZkConnection and add reconnect. The new method ensures reconnecting without leaving the field empty.

commit 879abf7b59c7b029e5a0dec21691b69a50722d27
Author: Jiajun Wang <jj...@...>
Date:   2019-01-11T23:53:17Z

    Improve the callback handler behavior regarding batch mode event handling when handler is reset.
    
    For new session handling, the callback handler should not interrupt the current executing process. This could cause a pending request failed unexpectedly. Note that after the change, closing a callback will still interrupt thread to avoid thread leak.

----


---

[GitHub] helix issue #297: ZkClient related improvments

Posted by dasahcc <gi...@git.apache.org>.
Github user dasahcc commented on the issue:

    https://github.com/apache/helix/pull/297
  
    LGTM! Make sure you pass all the tests.


---

[GitHub] helix pull request #297: ZkClient related improvments

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/helix/pull/297


---

[GitHub] helix pull request #297: ZkClient related improvments

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on a diff in the pull request:

    https://github.com/apache/helix/pull/297#discussion_r250807912
  
    --- Diff: helix-core/src/main/java/org/apache/helix/manager/zk/zookeeper/ZkConnection.java ---
    @@ -84,6 +83,25 @@ public void close() throws InterruptedException {
         }
       }
     
    +  protected void reconnect(Watcher watcher) throws InterruptedException {
    +    _zookeeperLock.lock();
    +    try {
    +      if (_zk == null) {
    +        throw new IllegalStateException("zk client has not been connected or already been closed");
    +      }
    +      ZooKeeper prevZk = _zk;
    +      try {
    +        LOG.debug("Creating new ZookKeeper instance to reconnect to " + _servers + ".");
    --- End diff --
    
    LOG.info?


---

[GitHub] helix pull request #297: ZkClient related improvments

Posted by jiajunwang <gi...@git.apache.org>.
Github user jiajunwang commented on a diff in the pull request:

    https://github.com/apache/helix/pull/297#discussion_r251134740
  
    --- Diff: helix-core/src/main/java/org/apache/helix/manager/zk/zookeeper/ZkConnection.java ---
    @@ -84,6 +83,25 @@ public void close() throws InterruptedException {
         }
       }
     
    +  protected void reconnect(Watcher watcher) throws InterruptedException {
    +    _zookeeperLock.lock();
    +    try {
    +      if (_zk == null) {
    +        throw new IllegalStateException("zk client has not been connected or already been closed");
    +      }
    +      ZooKeeper prevZk = _zk;
    +      try {
    +        LOG.debug("Creating new ZookKeeper instance to reconnect to " + _servers + ".");
    --- End diff --
    
    We already have state change logs. In addition, if the reconnect fails, we will print warning log in the ZkClient. Here, to be consistent with other ZkConnection, I think it's better that we keep it a debug log.


---

[GitHub] helix issue #297: ZkClient related improvments

Posted by jiajunwang <gi...@git.apache.org>.
Github user jiajunwang commented on the issue:

    https://github.com/apache/helix/pull/297
  
    I split the ZkConnection commit to separate logic change and file movement. Just to ensure the history of change is clear.


---

[GitHub] helix pull request #297: ZkClient related improvments

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on a diff in the pull request:

    https://github.com/apache/helix/pull/297#discussion_r250795524
  
    --- Diff: helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java ---
    @@ -674,12 +684,24 @@ public void handleChildChange(String parentPath, List<String> currentChilds) {
       /**
        * Invoke the listener for the last time so that the listener could clean up resources
        */
    +  @Deprecated
       public void reset() {
    +    reset(true);
    +  }
    +
    +  void reset(boolean isShutdown) {
         logger.info("Resetting CallbackHandler: " + this.toString());
    --- End diff --
    
    include isShutdown here in the log?


---