You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/12/03 20:27:53 UTC

[GitHub] [helix] pkuwm commented on a change in pull request #1526: Leverage zk paginated getChildren API for ZkClient to fetch a large number of children

pkuwm commented on a change in pull request #1526:
URL: https://github.com/apache/helix/pull/1526#discussion_r535490751



##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -132,10 +139,41 @@ public boolean exists(String path, boolean watch) throws KeeperException, Interr
     return _zk.exists(path, watch) != null;
   }
 
+  /**
+   * Returns a list of children of the given path.
+   * <p>
+   * If the watch is non-null and the call is successful (no exception is thrown),
+   * a watch will be left on the node with the given path.
+   * <p>
+   * The implementation uses java reflection to check whether the native zk supports
+   * paginated getChildren API:
+   * <p>- if yes, and {@link #GETCHILDREN_PAGINATION_DISABLED} is false, call the paginated API;
+   * <p>- otherwise, fall back to the non-paginated API.
+   *
+   * @param path the path of the node
+   * @param watch a boolean flag to indicate whether the watch should be added to the node
+   * @return a list of children of the given path
+   * @throws KeeperException if the server signals an error with a non-zero error code
+   * @throws InterruptedException if the server transaction is interrupted
+   */
   @Override
   public List<String> getChildren(final String path, final boolean watch)
       throws KeeperException, InterruptedException {
-    return _zk.getChildren(path, watch);
+    lookupGetChildrenMethod();

Review comment:
       I also thought about it and tried. 
   1. I was thinking not every client will be using `getChildren` so just no need to use reflection to parse the getChildren method every time in constructor.
   2. Since we will reset the method cache in reconnect, if we don't call the lookup method here, we still need to call lookup in reconnect and also connect (if close then connect). It means there will be 3 places (constructor, connect, reconnect) calling this method. I'd prefer to just call this method in one place to be simple and not easy to go wrong. If the method is cached, the lookup will just return.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +230,53 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {
+    if (_getChildrenMethod != null) {
+      // Method is already cached.
+      return;
+    }
+    try {
+      if (GETCHILDREN_PAGINATION_DISABLED) {
+        lookupNonPaginatedGetChildren();
+      } else {
+        // Lookup the paginated getChildren API
+        _getChildrenMethod =
+            ZooKeeper.class.getMethod("getAllChildrenPaginated", String.class, boolean.class);
+      }
+    } catch (NoSuchMethodException e1) {
+      // Pagination API is not supported, fall back to non-paginated API
+      lookupNonPaginatedGetChildren();
+    }

Review comment:
       I thought about it. The reason for this is I'd like to include the logging in the end. If we return before logging, there is no logging.
   
   Since you also prefer this, I have wrapped the code block into a private method so logging is after the method.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +230,53 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {
+    if (_getChildrenMethod != null) {
+      // Method is already cached.
+      return;
+    }
+    try {
+      if (GETCHILDREN_PAGINATION_DISABLED) {
+        lookupNonPaginatedGetChildren();
+      } else {
+        // Lookup the paginated getChildren API
+        _getChildrenMethod =
+            ZooKeeper.class.getMethod("getAllChildrenPaginated", String.class, boolean.class);
+      }
+    } catch (NoSuchMethodException e1) {
+      // Pagination API is not supported, fall back to non-paginated API
+      lookupNonPaginatedGetChildren();
+    }
+    LOG.info("Pagination config {}={}, method to be invoked: {}",
+        ZkSystemPropertyKeys.ZK_GETCHILDREN_PAGINATION_DISABLED, GETCHILDREN_PAGINATION_DISABLED,
+        _getChildrenMethod.getName());
+  }
+
+  private void lookupNonPaginatedGetChildren() {
+    try {
+      _getChildrenMethod = ZooKeeper.class.getMethod("getChildren", String.class, boolean.class);
+    } catch (NoSuchMethodException e2) {
+      // We should not expect this exception here.
+      LOG.error("getChildren is not supported in this zookeeper version!");

Review comment:
       If this is not supported, later then it is called, exception will also be thrown.
   We should not expect this exception as all zk versions support this method.

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +230,53 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {
+    if (_getChildrenMethod != null) {
+      // Method is already cached.
+      return;
+    }
+    try {
+      if (GETCHILDREN_PAGINATION_DISABLED) {
+        lookupNonPaginatedGetChildren();
+      } else {
+        // Lookup the paginated getChildren API
+        _getChildrenMethod =
+            ZooKeeper.class.getMethod("getAllChildrenPaginated", String.class, boolean.class);
+      }
+    } catch (NoSuchMethodException e1) {
+      // Pagination API is not supported, fall back to non-paginated API
+      lookupNonPaginatedGetChildren();
+    }
+    LOG.info("Pagination config {}={}, method to be invoked: {}",
+        ZkSystemPropertyKeys.ZK_GETCHILDREN_PAGINATION_DISABLED, GETCHILDREN_PAGINATION_DISABLED,
+        _getChildrenMethod.getName());
+  }
+
+  private void lookupNonPaginatedGetChildren() {
+    try {
+      _getChildrenMethod = ZooKeeper.class.getMethod("getChildren", String.class, boolean.class);
+    } catch (NoSuchMethodException e2) {
+      // We should not expect this exception here.
+      LOG.error("getChildren is not supported in this zookeeper version!");
+    }
+  }
+
+  private void handleInvokedMethodException(Throwable cause)
+      throws KeeperException, InterruptedException {
+    if (cause instanceof KeeperException.UnimplementedException) {
+      LOG.warn("Paginated getChildren is unimplemented in ZK server! "
+          + "Falling back to non-paginated getChildren");
+      lookupNonPaginatedGetChildren();

Review comment:
       Right. That's the purpose of the next line `throw KeeperException.create(KeeperException.Code.CONNECTIONLOSS);`
   
   When the `KeeperException.UnimplementedException` is thrown from the server, it is a connection loss, so just throw it to ZkClient and its retry logic will retry the `getChildren` operation appropriately. 

##########
File path: zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkConnection.java
##########
@@ -192,4 +230,53 @@ public String getServers() {
   public void addAuthInfo(String scheme, byte[] auth) {
     _zk.addAuthInfo(scheme, auth);
   }
+
+  private void lookupGetChildrenMethod() {
+    if (_getChildrenMethod != null) {
+      // Method is already cached.
+      return;
+    }
+    try {
+      if (GETCHILDREN_PAGINATION_DISABLED) {
+        lookupNonPaginatedGetChildren();
+      } else {
+        // Lookup the paginated getChildren API
+        _getChildrenMethod =
+            ZooKeeper.class.getMethod("getAllChildrenPaginated", String.class, boolean.class);
+      }
+    } catch (NoSuchMethodException e1) {
+      // Pagination API is not supported, fall back to non-paginated API
+      lookupNonPaginatedGetChildren();
+    }
+    LOG.info("Pagination config {}={}, method to be invoked: {}",
+        ZkSystemPropertyKeys.ZK_GETCHILDREN_PAGINATION_DISABLED, GETCHILDREN_PAGINATION_DISABLED,
+        _getChildrenMethod.getName());
+  }
+
+  private void lookupNonPaginatedGetChildren() {
+    try {
+      _getChildrenMethod = ZooKeeper.class.getMethod("getChildren", String.class, boolean.class);
+    } catch (NoSuchMethodException e2) {
+      // We should not expect this exception here.
+      LOG.error("getChildren is not supported in this zookeeper version!");
+    }
+  }
+
+  private void handleInvokedMethodException(Throwable cause)

Review comment:
       > Just to confirm, as we discussed, if the ZK server is upgraded in between, the client shall be able to leverage the new API immediately, right? In this case, I assume we expect the ZkConnection object to be recreated. And I believe this is the case when we handle new session creation.
   
   If zk server is upgraded after the client has been created, we can not guarantee the client will always know server has enabled pagination. The example is what you said "what if the server upgrade is fast and the session is not expired". If session expires during the upgrade, client can reset the cache and re-detect the pagination. It cannot solving the problem of completely auto detecting.
   
   > But what if the server upgrade is fast and the session is not expired? In this case, since you update the method here (and fallback to the older one), then the client will never use pagination call unless we restart the client.
   
   ZkClient cannot automatically and dynamically handle this case. The agreement is, to enable pagination, ZK server should be supposed to upgrade to have pagination as the first step. If client has pagination first but server upgrades later, client should be expected to bounce.
   To detect the server has enabled the pagination without a session change, every time the client's getChildren call must try pagination first. This is inefficient and not what we want. We have to trade off. And obviously we'd expect clients to restart if this case happens.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org