You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/16 08:03:22 UTC

[GitHub] [pulsar] Technoboy- opened a new pull request, #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.

Technoboy- opened a new pull request, #17689:
URL: https://github.com/apache/pulsar/pull/17689

   Cherry-pick #15755
   Master issue #15643, #15753
   
   ### Motivation
   
   
   Blocked at BrokerService#unloadNamespaceBundlesGracefully:
   ```
   2022-05-20T03:37:05.4960249Z "main" #1 prio=5 os_prio=0 cpu=32274.29ms elapsed=2566.54s tid=0x00007fd108024380 nid=0x1af8f waiting on condition  [0x00007fd10fcd0000]
   2022-05-20T03:37:05.4960659Z    java.lang.Thread.State: WAITING (parking)
   2022-05-20T03:37:05.4961114Z 	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
   2022-05-20T03:37:05.4961875Z 	- parking to wait for  <0x00000000cdf00010> (a java.util.concurrent.CompletableFuture$Signaller)
   2022-05-20T03:37:05.4962343Z 	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
   2022-05-20T03:37:05.4963171Z 	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.3/CompletableFuture.java:1864)
   2022-05-20T03:37:05.4963683Z 	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.3/ForkJoinPool.java:3463)
   2022-05-20T03:37:05.4964169Z 	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.3/ForkJoinPool.java:3434)
   2022-05-20T03:37:05.4964660Z 	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.3/CompletableFuture.java:1898)
   2022-05-20T03:37:05.4965158Z 	at java.util.concurrent.CompletableFuture.get(java.base@17.0.3/CompletableFuture.java:2072)
   2022-05-20T03:37:05.4965715Z 	at org.apache.pulsar.broker.service.BrokerService.lambda$unloadNamespaceBundlesGracefully$21(BrokerService.java:919)
   2022-05-20T03:37:05.4966467Z 	at org.apache.pulsar.broker.service.BrokerService$$Lambda$1164/0x0000000801527c70.accept(Unknown Source)
   2022-05-20T03:37:05.4966882Z 	at java.lang.Iterable.forEach(java.base@17.0.3/Iterable.java:75)
   2022-05-20T03:37:05.4967408Z 	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:911)
   2022-05-20T03:37:05.4968078Z 	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:887)
   2022-05-20T03:37:05.4968664Z 	at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:732)
   2022-05-20T03:37:05.4969579Z 	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:450)
   2022-05-20T03:37:05.4970123Z 	at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372)
   2022-05-20T03:37:05.4970720Z 	at 
   ```
   
   Blocked at CoordinationServiceImpl#close
   ```
   2022-05-20T01:17:56.3359346Z "main" #1 prio=5 os_prio=0 cpu=11209.07ms elapsed=3506.06s tid=0x00007f9484024380 nid=0xaba waiting on condition  [0x00007f9489edd000]
   2022-05-20T01:17:56.3361587Z    java.lang.Thread.State: WAITING (parking)
   2022-05-20T01:17:56.3363789Z 	at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
   2022-05-20T01:17:56.3366545Z 	- parking to wait for  <0x00000000cd180010> (a java.util.concurrent.CompletableFuture$Signaller)
   2022-05-20T01:17:56.3368917Z 	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
   2022-05-20T01:17:56.3371298Z 	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.3/CompletableFuture.java:1864)
   2022-05-20T01:17:56.3373823Z 	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.3/ForkJoinPool.java:3463)
   2022-05-20T01:17:56.3376212Z 	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.3/ForkJoinPool.java:3434)
   2022-05-20T01:17:56.3378608Z 	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.3/CompletableFuture.java:1898)
   2022-05-20T01:17:56.3380999Z 	at java.util.concurrent.CompletableFuture.join(java.base@17.0.3/CompletableFuture.java:2117)
   2022-05-20T01:17:56.3383947Z 	at org.apache.pulsar.metadata.coordination.impl.CoordinationServiceImpl.close(CoordinationServiceImpl.java:72)
   2022-05-20T01:17:56.3386574Z 	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:526)
   2022-05-20T01:17:56.3388569Z 	at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372)
   ```
   
   For BrokerService#unloadNamespaceBundlesGracefully, the request chain :
   ```
   brokerService.closeAsync() -> OwnedBundle.handleUnloadRequest -> pulsar.getNamespaceService().getOwnershipCache().removeOwnership(bundle) -> OwnershipCache.removeOwnership ->
   ResourceLock.release 
   ```
   
   For CoordinationServiceImpl#close, the request chain :
   ```
   CoordinationServiceImpl.close -> LockManager.asyncClose -> ResourceLock.release
   ```
   We find that it's all related to ResourceLock#release.
   
   As the CI using the MockedZooKeeper, I find that if there are some RuntimeException, the response could never finish. So I add the catch block to ensure that all the requests will reply.   But I'm not sure if the return code is right.
   https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L332-L402
   
   https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L916-L976
   
   
   More, the current close process has some order issues. LoadManager is closed before BrokerService, but BrokerService closes need to invoke LoadManager, even though the LoadManager is stateless, but is a little confused here. 
   
   https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java#L443-L452
   
   https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L891-L902
    
   ### Documentation
   
   - [x] `no-need-doc` 
   (Please explain why)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on pull request #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #17689:
URL: https://github.com/apache/pulsar/pull/17689#issuecomment-1249059576

   @Technoboy- Please provide a correct documentation label for your PR.
   Instructions see [Pulsar Documentation Label Guide](https://docs.google.com/document/d/1Qw7LHQdXWBW9t2-r-A7QdFDBwmZh6ytB4guwMoXHqc0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- merged pull request #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.

Posted by GitBox <gi...@apache.org>.
Technoboy- merged PR #17689:
URL: https://github.com/apache/pulsar/pull/17689


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org