You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/16 08:03:22 UTC
[GitHub] [pulsar] Technoboy- opened a new pull request, #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.
Technoboy- opened a new pull request, #17689:
URL: https://github.com/apache/pulsar/pull/17689
Cherry-pick #15755
Master issue #15643, #15753
### Motivation
Blocked at BrokerService#unloadNamespaceBundlesGracefully:
```
2022-05-20T03:37:05.4960249Z "main" #1 prio=5 os_prio=0 cpu=32274.29ms elapsed=2566.54s tid=0x00007fd108024380 nid=0x1af8f waiting on condition [0x00007fd10fcd0000]
2022-05-20T03:37:05.4960659Z java.lang.Thread.State: WAITING (parking)
2022-05-20T03:37:05.4961114Z at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
2022-05-20T03:37:05.4961875Z - parking to wait for <0x00000000cdf00010> (a java.util.concurrent.CompletableFuture$Signaller)
2022-05-20T03:37:05.4962343Z at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
2022-05-20T03:37:05.4963171Z at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.3/CompletableFuture.java:1864)
2022-05-20T03:37:05.4963683Z at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.3/ForkJoinPool.java:3463)
2022-05-20T03:37:05.4964169Z at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.3/ForkJoinPool.java:3434)
2022-05-20T03:37:05.4964660Z at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.3/CompletableFuture.java:1898)
2022-05-20T03:37:05.4965158Z at java.util.concurrent.CompletableFuture.get(java.base@17.0.3/CompletableFuture.java:2072)
2022-05-20T03:37:05.4965715Z at org.apache.pulsar.broker.service.BrokerService.lambda$unloadNamespaceBundlesGracefully$21(BrokerService.java:919)
2022-05-20T03:37:05.4966467Z at org.apache.pulsar.broker.service.BrokerService$$Lambda$1164/0x0000000801527c70.accept(Unknown Source)
2022-05-20T03:37:05.4966882Z at java.lang.Iterable.forEach(java.base@17.0.3/Iterable.java:75)
2022-05-20T03:37:05.4967408Z at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:911)
2022-05-20T03:37:05.4968078Z at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:887)
2022-05-20T03:37:05.4968664Z at org.apache.pulsar.broker.service.BrokerService.closeAsync(BrokerService.java:732)
2022-05-20T03:37:05.4969579Z at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:450)
2022-05-20T03:37:05.4970123Z at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372)
2022-05-20T03:37:05.4970720Z at
```
Blocked at CoordinationServiceImpl#close
```
2022-05-20T01:17:56.3359346Z "main" #1 prio=5 os_prio=0 cpu=11209.07ms elapsed=3506.06s tid=0x00007f9484024380 nid=0xaba waiting on condition [0x00007f9489edd000]
2022-05-20T01:17:56.3361587Z java.lang.Thread.State: WAITING (parking)
2022-05-20T01:17:56.3363789Z at jdk.internal.misc.Unsafe.park(java.base@17.0.3/Native Method)
2022-05-20T01:17:56.3366545Z - parking to wait for <0x00000000cd180010> (a java.util.concurrent.CompletableFuture$Signaller)
2022-05-20T01:17:56.3368917Z at java.util.concurrent.locks.LockSupport.park(java.base@17.0.3/LockSupport.java:211)
2022-05-20T01:17:56.3371298Z at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.3/CompletableFuture.java:1864)
2022-05-20T01:17:56.3373823Z at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.3/ForkJoinPool.java:3463)
2022-05-20T01:17:56.3376212Z at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.3/ForkJoinPool.java:3434)
2022-05-20T01:17:56.3378608Z at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.3/CompletableFuture.java:1898)
2022-05-20T01:17:56.3380999Z at java.util.concurrent.CompletableFuture.join(java.base@17.0.3/CompletableFuture.java:2117)
2022-05-20T01:17:56.3383947Z at org.apache.pulsar.metadata.coordination.impl.CoordinationServiceImpl.close(CoordinationServiceImpl.java:72)
2022-05-20T01:17:56.3386574Z at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:526)
2022-05-20T01:17:56.3388569Z at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:372)
```
For BrokerService#unloadNamespaceBundlesGracefully, the request chain :
```
brokerService.closeAsync() -> OwnedBundle.handleUnloadRequest -> pulsar.getNamespaceService().getOwnershipCache().removeOwnership(bundle) -> OwnershipCache.removeOwnership ->
ResourceLock.release
```
For CoordinationServiceImpl#close, the request chain :
```
CoordinationServiceImpl.close -> LockManager.asyncClose -> ResourceLock.release
```
We find that it's all related to ResourceLock#release.
As the CI using the MockedZooKeeper, I find that if there are some RuntimeException, the response could never finish. So I add the catch block to ensure that all the requests will reply. But I'm not sure if the return code is right.
https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L332-L402
https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/testmocks/src/main/java/org/apache/zookeeper/MockZooKeeper.java#L916-L976
More, the current close process has some order issues. LoadManager is closed before BrokerService, but BrokerService closes need to invoke LoadManager, even though the LoadManager is stateless, but is a little confused here.
https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java#L443-L452
https://github.com/apache/pulsar/blob/3a8045851f7e9ea62da104dab2b7fe2b47a95ca9/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L891-L902
### Documentation
- [x] `no-need-doc`
(Please explain why)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on pull request #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #17689:
URL: https://github.com/apache/pulsar/pull/17689#issuecomment-1249059576
@Technoboy- Please provide a correct documentation label for your PR.
Instructions see [Pulsar Documentation Label Guide](https://docs.google.com/document/d/1Qw7LHQdXWBW9t2-r-A7QdFDBwmZh6ytB4guwMoXHqc0).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- merged pull request #17689: [branch-2.10][cherry-pick] Fix the broker close hanged issue.
Posted by GitBox <gi...@apache.org>.
Technoboy- merged PR #17689:
URL: https://github.com/apache/pulsar/pull/17689
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org