You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/23 14:12:19 UTC

[GitHub] [pulsar] zhaohaidao opened a new issue, #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

zhaohaidao opened a new issue, #18163:
URL: https://github.com/apache/pulsar/issues/18163

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.
   
   
   ### Version
   
   Pulsar 2.11
   
   ### Minimal reproduce step
   
   I don't know how to reproduce it 
   
   ### What did you expect to see?
   
   Recover ledger successfully
   
   ### What did you see instead?
   
      9333 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN  org.apache.pulsar.broker.service.BrokerSe
      9333 rvice - Failed to create topic persistent://benchmark/ns-test/test-test-reusing-topic500-0000-partition-426
      9334 Oct 23 16:55:56 test_host pulsar[123890]: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger
      9335 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [main-EventThread] INFO  org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [10.225.50.
      9335 43:3181, 10.224.145.38:3181, 10.225.105.4:3181] for ledger: 24474
      9336 Oct 23 16:55:56 test_host pulsar[123890]: 2022-10-23T16:55:56,889+0800 [BookKeeperClientWorker-OrderedExecutor-26-0] ERROR org.apache.bookkeeper.common.util.SafeRu
      9336 nnable - Unexpected throwable caught
      9337 Oct 23 16:55:56 test_host pulsar[123890]: java.lang.NullPointerException: Cannot read field "distributionSchedule" because "this.lh" is null
      9338 Oct 23 16:55:56 test_host pulsar[123890]: at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:272) ~[org.apache.bookkeeper-bookkeeper-server-4.1   9338 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
      9339 Oct 23 16:55:56 test_host pulsar[123890]: at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.1   9339 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
      9340 Oct 23 16:55:56 test_host pulsar[123890]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
      9341 Oct 23 16:55:56 test_host pulsar[123890]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
      9342 Oct 23 16:55:56 test_host pulsar[123890]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.F   9342 inal.jar:4.1.77.Final]
      9343 Oct 23 16:55:56 test_host pulsar[123890]: at java.lang.Thread.run(Thread.java:833) ~[?:?]
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zymap commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zymap commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294952698

   Is there any error logs? If it is caused by that PR, there should have LedgerClosed or Interrupt errors I think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] hangc0276 commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294496692

   From the exception stack, it says the `lh` is null, but we have already checked it is not null. One possible reason is that the `lh` has been released, but we need to figure out how it happens.
   https://github.com/apache/bookkeeper/blob/ef31c7a37415c063b2f485f63198ec28e8f9f0fa/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L247-L281
   
   Would you please try to confirm 
   1. Whether branch-4.15 has the same problem.
   2. Please try to add more log here and reproduce it to get more information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294751540

   I am not sure if this issue is related to [#3321](https://github.com/apache/bookkeeper/pull/3321), which try to fix ledger handle leak issues
   
   @hangc0276 @codelipenghui 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zymap commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zymap commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1296497373

   Does this error still happen? 
   I am not sure about it is related to [#3321](https://github.com/apache/bookkeeper/pull/3321). In 3321, the recycling happened at the beginning of add entry operation. If it is recycled, the pendingAddOp won't be executed. 
   Because you are using the master code, could you please take the following experiment:
   - Change the code base to branch-4.15 to test if the latest code of 4.15 works well. Currently, the pulsar still using 4.15.x code.
   - Rever the #3321 in the master to check if it can reproduce the same error.
   
   @zhaohaidao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1295030760

   The error log context is as followed. Not sure if it can help.
   @zymap 
   
   > 11338 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.RackawareEn
     11338 semblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], fallback to choose bookie
     11338 randomly from the cluster.
     11339 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.RackawareEn
     11339 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
     11339 10.225.105.4:3181>, <Bookie:10.225.50.43:3181>, <Bookie:10.224.145.38:3181>].
     11340 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.RackawareEn
     11340 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
     11340 10.224.145.38:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.225.50.43:3181>].
     11341 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.RackawareEn
     11341 semblePlacementPolicyImpl - Failed to choose a bookie: excluded [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], fallback to choose bookie
     11341 randomly from the cluster.
     11342 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.RackawareEn
     11342 semblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies [<Bookie:10.225.50.43:3181>, <Bookie:10.225.105.4:3181>, <Bookie:10.224.145.38:3181>], allBookies [<Bookie:
     11342 10.225.105.4:3181>, <Bookie:10.224.145.38:3181>, <Bookie:10.225.50.43:3181>].
     11343 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.ReadOnlyLed
     11343 gerHandle - Could not get additional bookie to remake ensemble, closing ledger: 3116
     11344 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.PendingAddO
     11344 p - Write of ledger entry to quorum failed: L3116 E1032
     11345 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,055+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.client.LedgerRecov
     11345 eryOp - Failure NotEnoughBookiesException: Not enough non-faulty bookies available while writing entry: 1033 while recovering ledger: 3116
     11346 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] ERROR org.apache.bookkeeper.mledger.impl.Manag
     11346 edLedgerImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Failed to open ledger 3116: Error while recovering ledger
     11347 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] ERROR org.apache.bookkeeper.mledger.impl.Manag
     11347 edLedgerFactoryImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Failed to initialize managed ledger: Error while recovering ledger
     11348 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] INFO  org.apache.bookkeeper.mledger.impl.Manag
     11348 edLedgerImpl - [benchmark/ns-test/persistent/test-test-reusing-topic500-0000-partition-155] Closing managed ledger
     11349 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-13-0] WARN  org.apache.pulsar.broker.service.BrokerS
     11349 ervice - Failed to create topic persistent://benchmark/ns-test/test-test-reusing-topic500-0000-partition-155
     11350 Oct 27 18:02:42 test_host pulsar[166925]: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering ledger
     11351 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] ERROR org.apache.bookkeeper.common.util.SafeRu
     11351 nnable - Unexpected throwable caught
     11352 Oct 27 18:02:42 test_host pulsar[166925]: java.lang.NullPointerException: Cannot read field "distributionSchedule" because "this.lh" is null
     11353 Oct 27 18:02:42 test_host pulsar[166925]: at org.apache.bookkeeper.client.PendingAddOp.safeRun(PendingAddOp.java:272) ~[org.apache.bookkeeper-bookkeeper-server-4.1  11353 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
     11354 Oct 27 18:02:42 test_host pulsar[166925]: at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.1  11354 6.1-SNAPSHOT.jar:4.16.1-SNAPSHOT]
     11355 Oct 27 18:02:42 test_host pulsar[166925]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
     11356 Oct 27 18:02:42 test_host pulsar[166925]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
     11357 Oct 27 18:02:42 test_host pulsar[166925]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.77.F  11357 inal.jar:4.1.77.Final]
     11358 Oct 27 18:02:42 test_host pulsar[166925]: at java.lang.Thread.run(Thread.java:833) ~[?:?]
     11359 Oct 27 18:02:42 test_host pulsar[166925]: 2022-10-27T18:02:42,056+0800 [BookKeeperClientWorker-OrderedExecutor-22-0] WARN  org.apache.bookkeeper.client.LedgerRecov  11359 eryOp - Successfully read entry 1033 for ledger 3116, but readDone is already false


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1305408559

   > Does this error still happen? I am not sure about it is related to [#3321](https://github.com/apache/bookkeeper/pull/3321). In 3321, the recycling happened at the beginning of add entry operation. If it is recycled, the pendingAddOp won't be executed. Because you are using the master code, could you please take the following experiment:
   > 
   > * Change the code base to branch-4.15 to test if the latest code of 4.15 works well. Currently, the pulsar still using 4.15.x code.
   > * Rever the [#3321](https://github.com/apache/bookkeeper/pull/3321). in the master to check if it can reproduce the same error.
   > 
   > @zhaohaidao
   
   Sorry for replying late. Through analysis, It can be sure this issue is nothing to do with #3321. And branch 4.15 has the same problem as you expected. Issue analysis can be seen in: https://github.com/apache/bookkeeper/issues/3618


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zhaohaidao commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
zhaohaidao commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294729957

   > java.lang.NullPointerException
   
   Yes. it is a bookkeeper client issue, and the bkclient built on the master branch. 
   Sorry for submitting to the wrong place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tjiuming commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
tjiuming commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294415987

   this should be a bk client issue @hangc0276 PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1294417688

   @zhaohaidao I think the issue is not for Pulsar 2.11 right? I noticed the bookkeeper version should be a version that builds on the bookkeeper master branch. I just want to make sure it's not a regression for the 2.11.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #18163: [Bug] NPE throwed when recovering ledger: Cannot read field "distributionSchedule" because "this.lh" is null

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #18163:
URL: https://github.com/apache/pulsar/issues/18163#issuecomment-1341869034

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org