You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/06/28 22:59:40 UTC

[GitHub] [pulsar] rdhabalia commented on issue #4635: Bookie down causes deadlock in broker

rdhabalia commented on issue #4635: Bookie down causes deadlock in broker
URL: https://github.com/apache/pulsar/issues/4635#issuecomment-506900304
 
 
   @massakam 
   Thread-dump shows that most of the bk-client-ordered and pulsar-ordered threads are waiting on
   `at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(.java:95)        at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping.getRack(ZkBookieRackAffinityMapping.java:154)`
   and I think thread should be unblocked at that place due to this fix: #3633 addressed in 2.3.1. and it seems thread-dump doesn't show clear evidence that broker went down because one of the bookie shutdown.
   ```
   java.lang.Thread.State: TIMED_WAITING
           at sun.misc.Unsafe.park(Native Method)
           at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
           at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695)
           at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
           at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
           at org.apache.pulsar.zookeeper.ZooKeeperDataCache.get(
   
   .java:95)
           at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping.getRack(ZkBookieRackAffinityMapping.java:154)
           at org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping.resolve(ZkBookieRackAffinityMapping.java:146)
           at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl$DNSResolverDecorator.resolve(RackawareEnsemblePlacementPolicyImpl.java:174)
   ```
   
   Also, I am thinking why thread was blocked at ZooKeeperDataCache::95 because we already have timeout on that blocking call.
   https://github.com/apache/pulsar/blob/v2.3.1/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZooKeeperDataCache.java#L95
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services