You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/23 14:12:19 UTC
[GitHub] [pulsar] zhaohaidao opened a new issue, #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
zhaohaidao opened a new issue, #18163:
URL: https://github.com/apache/pulsar/issues/18163
### Search before asking
- [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.
### Version
Pulsar 2.11
### Minimal reproduce step
I don't know how to reproduce it
### What did you expect to see?
Recover ledger successfully
### What did you see instead?
9333 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN org.apache.pulsar.broker.service.BrokerSe
9333 rvice - Failed to create topic persistent://benchmark/ns-test/test-test-reusing-topic500-0000-partition-426
9334 Oct 23 16:55:56 test_host pulsar[123890]: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger
9335 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.225.50.
9335 43:3181, 10.224.145.38:3181, 10.225.105.4:3181] for ledger: 24474
9336 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [BookKeeperClientWorker-OrderedExecutor-26-0] ERROR org.apache.bookkeeper.common.util.SafeRu
9336 nnable - Unexpected throwable caught
9337 Oct 23 16:55:56 test_host pulsar[123890]: java.lang.NullPointerException: Cannot read field "distributionSchedule" because "this.lh" is null
9338 Oct 23 16:55:56 test_host pulsar[123890]: at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:272) ~[org.apache.bookkeeper-bookkeeper-server-4.1 9338 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
9339 Oct 23 16:55:56 test_host pulsar[123890]: at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.1 9339 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
9340 Oct 23 16:55:56 test_host pulsar[123890]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
9341 Oct 23 16:55:56 test_host pulsar[123890]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
9342 Oct 23 16:55:56 test_host pulsar[123890]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.F 9342 inal.jar:4.1.77.Final]
9343 Oct 23 16:55:56 test_host pulsar[123890]: at java.lang.Thread.run(Thread.java:833) ~[?:?]
### Anything else?
_No response_
### Are you willing to submit a PR?
- [X] I'm willing to submit a PR!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zymap commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zymap commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294952698
Is there any error logs? If it is caused by that PR, there should have LedgerClosed or Interrupt errors I think
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] hangc0276 commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294496692
From the exception stack, it says the `lh` is null, but we have already checked it is not null. One possible reason is that the `lh` has been released, but we need to figure out how it happens.
https://github.com/apache/bookkeeper/blob/ef31c7a37415c063b2f485f63198ec28e8f9f0fa/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L247-L281
Would you please try to confirm
1. Whether branch-4.15 has the same problem.
2. Please try to add more log here and reproduce it to get more information.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294751540
I am not sure if this issue is related to [#3321](https://github.com/apache/bookkeeper/pull/3321), which try to fix ledger handle leak issues
@hangc0276 @codelipenghui
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zymap commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zymap commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1296497373
Does this error still happen?
I am not sure about it is related to [#3321](https://github.com/apache/bookkeeper/pull/3321). In 3321, the recycling happened at the beginning of add entry operation. If it is recycled, the pendingAddOp won't be executed.
Because you are using the master code, could you please take the following experiment:
- Change the code base to branch-4.15 to test if the latest code of 4.15 works well. Currently, the pulsar still using 4.15.x code.
- Rever the #3321 in the master to check if it can reproduce the same error.
@zhaohaidao
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1295030760
The error log context is as followed. Not sure if it can help.
@zymap
> 11338 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.RackawareEn
11338 semblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], fallback to choose bookie
11338 randomly from the cluster.
11339 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.RackawareEn
11339 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
11339 10.225.105.4:3181>, <Bookie:10.225.50.43:3181>, <Bookie:10.224.145.38:3181>].
11340 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.RackawareEn
11340 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
11340 10.224.145.38:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.225.50.43:3181>].
11341 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.RackawareEn
11341 semblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], fallback to choose bookie
11341 randomly from the cluster.
11342 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.RackawareEn
11342 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
11342 10.225.105.4:3181>, <Bookie:10.224.145.38:3181>, <Bookie:10.225.50.43:3181>].
11343 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.ReadOnlyLed
11343 gerHandle - Could not get additional bookie to remake ensemble, closing ledger: 3116
11344 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.PendingAddO
11344 p - Write of ledger entry to quorum failed: L3116 E1032
11345 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.LedgerRecov
11345 eryOp - Failure NotEnoughBookiesException: Not enough non-faulty bookies available while writing entry: 1033 while recovering ledger: 3116
11346 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] ERROR org.apache.bookkeeper.mledger.impl.Manag
11346 edLedgerImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Failed to open ledger 3116: Error while recovering ledger
11347 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] ERROR org.apache.bookkeeper.mledger.impl.Manag
11347 edLedgerFactoryImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Failed to initialize managed ledger: Error while recovering ledger
11348 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] INFO org.apache.bookkeeper.mledger.impl.Manag
11348 edLedgerImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Closing managed ledger
11349 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] WARN org.apache.pulsar.broker.service.BrokerS
11349 ervice - Failed to create topic persistent://benchmark/ns-test/test-test-reusing-topic500-0000-partition-155
11350 Oct 27 18:02:42 test_host pulsar[166925]: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger
11351 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.common.util.SafeRu
11351 nnable - Unexpected throwable caught
11352 Oct 27 18:02:42 test_host pulsar[166925]: java.lang.NullPointerException: Cannot read field "distributionSchedule" because "this.lh" is null
11353 Oct 27 18:02:42 test_host pulsar[166925]: at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:272) ~[org.apache.bookkeeper-bookkeeper-server-4.1 11353 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
11354 Oct 27 18:02:42 test_host pulsar[166925]: at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.1 11354 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
11355 Oct 27 18:02:42 test_host pulsar[166925]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
11356 Oct 27 18:02:42 test_host pulsar[166925]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
11357 Oct 27 18:02:42 test_host pulsar[166925]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.F 11357 inal.jar:4.1.77.Final]
11358 Oct 27 18:02:42 test_host pulsar[166925]: at java.lang.Thread.run(Thread.java:833) ~[?:?]
11359 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN org.apache.bookkeeper.client.LedgerRecov 11359 eryOp - Successfully read entry 1033 for ledger 3116, but readDone is already false
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1305408559
> Does this error still happen? I am not sure about it is related to [#3321](https://github.com/apache/bookkeeper/pull/3321). In 3321, the recycling happened at the beginning of add entry operation. If it is recycled, the pendingAddOp won't be executed. Because you are using the master code, could you please take the following experiment:
>
> * Change the code base to branch-4.15 to test if the latest code of 4.15 works well. Currently, the pulsar still using 4.15.x code.
> * Rever the [#3321](https://github.com/apache/bookkeeper/pull/3321). in the master to check if it can reproduce the same error.
>
> @zhaohaidao
Sorry for replying late. Through analysis, It can be sure this issue is nothing to do with #3321. And branch 4.15 has the same problem as you expected. Issue analysis can be seen in: https://github.com/apache/bookkeeper/issues/3618
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294729957
> java.lang.NullPointerException
Yes. it is a bookkeeper client issue, and the bkclient built on the master branch.
Sorry for submitting to the wrong place.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] tjiuming commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
tjiuming commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294415987
this should be a bk client issue @hangc0276 PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codelipenghui commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294417688
@zhaohaidao I think the issue is not for Pulsar 2.11 right? I noticed the bookkeeper version should be a version that builds on the bookkeeper master branch. I just want to make sure it's not a regression for the 2.11.0 release.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] github-actions[bot] commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1341869034
The issue had no activity for 30 days, mark with Stale label.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org