You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/29 09:32:59 UTC

[GitHub] [pulsar] massakam opened a new pull request #8406: [broker] Fix deadlock that occurred during topic ownership check

massakam opened a new pull request #8406:
URL: https://github.com/apache/pulsar/pull/8406


   ### Motivation
   
   The other day, some of our broker servers had deadlocks while splitting namespace bundles. As a result of checking the thread dump of the broker, some threads were blocked in `NamespaceService#getBundle()`.
   
   ```
   "ForkJoinPool.commonPool-worker-120" #547 daemon prio=5 os_prio=0 tid=0x00007efab4020800 nid=0x1318b waiting on condition [0x00007efa229e7000]
      java.lang.Thread.State: WAITING (parking)
           at sun.misc.Unsafe.park(Native Method)
           - parking to wait for  <0x00007f385c0dc720> (a java.util.concurrent.CompletableFuture$Signaller)
           at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
           at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
           at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)
           at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
           at com.github.benmanes.caffeine.cache.LocalAsyncLoadingCache$LoadingCacheView.get(LocalAsyncLoadingCache.java:400)
           at org.apache.pulsar.common.naming.NamespaceBundleFactory.getBundles(NamespaceBundleFactory.java:155)
           at org.apache.pulsar.broker.namespace.NamespaceService.getBundle(NamespaceService.java:177)
           at org.apache.pulsar.broker.namespace.NamespaceService.isTopicOwned(NamespaceService.java:849)
           at org.apache.pulsar.broker.namespace.NamespaceService.isServiceUnitOwned(NamespaceService.java:813)
           at org.apache.pulsar.broker.service.BrokerService.checkTopicNsOwnership(BrokerService.java:1013)
           at org.apache.pulsar.broker.service.BrokerService.loadOrCreatePersistentTopic(BrokerService.java:625)
           at org.apache.pulsar.broker.service.BrokerService.lambda$getTopic$6(BrokerService.java:500)
           at org.apache.pulsar.broker.service.BrokerService$$Lambda$476/389775283.apply(Unknown Source)
           at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.put(ConcurrentOpenHashMap.java:274)
           at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.computeIfAbsent(ConcurrentOpenHashMap.java:129)
           at org.apache.pulsar.broker.service.BrokerService.getTopic(BrokerService.java:499)
           at org.apache.pulsar.broker.service.BrokerService.getOrCreateTopic(BrokerService.java:483)
           at org.apache.pulsar.broker.service.ServerCnx.lambda$null$13(ServerCnx.java:681)
           at org.apache.pulsar.broker.service.ServerCnx$$Lambda$835/1815803313.apply(Unknown Source)
           at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
           at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
           at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
           at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575)
           at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:943)
           at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457)
           at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
           at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
           at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
           at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163)
   ```
   
   I think this is the deadlock that should have been fixed in https://github.com/apache/pulsar/pull/4190. It seems that https://github.com/apache/pulsar/pull/4190 has been reverted by https://github.com/apache/pulsar/pull/5919.
   
   ### Modifications
   
   The blocking method `getBundle()` should not be used in `NamespaceService#isTopicOwned()`.  However, reverting https://github.com/apache/pulsar/pull/5919 reoccurs the issue that the clients cannot reconnect to the topic of the splited bundle.
   
   So, ʻisTopicOwned()` returns false once, but gets the bundle metadata asynchronously so that the metadata is cached. The next time the client reconnects, the bundle metadata has been cached so it can return the correct result.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on pull request #8406: [broker] Fix deadlock that occurred during topic ownership check

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on pull request #8406:
URL: https://github.com/apache/pulsar/pull/8406#issuecomment-719437699


   > @wolfstudy Please also onboard this fix in release 2.6.2
   
   Sure, will process it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #8406: [broker] Fix deadlock that occurred during topic ownership check

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #8406:
URL: https://github.com/apache/pulsar/pull/8406#issuecomment-719435653


   @wolfstudy Please also onboard this fix in release 2.6.2


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat merged pull request #8406: [broker] Fix deadlock that occurred during topic ownership check

Posted by GitBox <gi...@apache.org>.
merlimat merged pull request #8406:
URL: https://github.com/apache/pulsar/pull/8406


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org