You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by gi...@git.apache.org on 2017/07/17 07:45:36 UTC

[GitHub] rdhabalia commented on issue #569: Revert back to default ZookeeperClientFactoryImpl

rdhabalia commented on issue #569: Revert back to default ZookeeperClientFactoryImpl
URL: https://github.com/apache/incubator-pulsar/pull/569#issuecomment-315687079
 
 
   After enabling debug log, found out that build exists because `ZooKeeperSessionWatcher` couldn't get heartbeat with in zksession timeout.
   ```
   [pulsar-zk-session-watcher-274-1:ZooKeeperSessionWatcher@164] - zoo keeper disconnected, waiting to reconnect, time remaining 0
   [pulsar-zk-session-watcher-75235-1:ZooKeeperSessionWatcher@158] - timeout expired for reconnecting, invoking shutdown service
   ```
   
   After digging into it, it seems issue is not BK-ZkClient library but the processing time of zk-response into aspectj-advice. [ZKClientCnxAspect](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/zookeeper/aspectj/ClientCnxnAspect.java#L72) intercept zk-response call and if takes more than few msec then zk-client somewhere lose the event (not sure what exactly happens in zk-client) and it doesn't serve any subsequent zk-response which ultimately cause zk-timeout.
   
   It can be easily verified by 
   **Fix will not fail if:** commenting out [event-notification at timedProcessEvent](https://github.com/apache/incubator-pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/zookeeper/aspectj/ClientCnxnAspect.java#L81)
   ```java
   if (request != null) {
         long timeElapsed = (MathUtils.now() - startTimeMs);
          //notifyListeners(checkType(request), timeElapsed);
   }
   ```
   
   **build immediately fails**
   Replace`notifyListeners(checkType(request), timeElapsed);` with `Thread.sleep(50)`
   ```java
   if (request != null) {
         long timeElapsed = (MathUtils.now() - startTimeMs);
          Thread.sleep(100); // if it takes more than few msec then zk-client lib misbehaves
   }
   ```
   
   I am testing the [fix](https://github.com/rdhabalia/pulsar/commit/af6734d2da66a0605f9cb0a96f116345502de74b), and will create a PR after testing it multiple times.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services