You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2023/01/20 07:00:47 UTC

[GitHub] [bookkeeper] zhaohaidao commented on pull request #3620: Fix NPE issue during recovery add

zhaohaidao commented on PR #3620:
URL: https://github.com/apache/bookkeeper/pull/3620#issuecomment-1397991723

   > Sorry, this change may introduce a new problem. There is a case: The ledger handles asyncAddEntry 10000 times. So there are 10000 PendingAddOp will be sent to the LedgerHandle#pendingAddOps.
   > 
   > The 10000 PendingAddOp will be executed one by one in the future. Before this change, `errorOutPendingAdds` is a sync operation; it takes all the PendingAddOp from LedgerHandle#pendingAddOps, then submits a callback. After submitting callback, all the PendingAddOp#callbackTriggered will be set to `true`. Because we execute every PendingAddOp in the thread pool, every PendingAddOp will be run at some time.
   > 
   > https://github.com/apache/bookkeeper/blob/7b5b6b240ca7551da266903078f2dd2ba2906b96/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L248-L254
   > 
   > 
   > At now, we check PendingAddOp#callbackTriggered value is true or not; if true, it will return directly, didn't send any request to the server; it's quick.
   > But if we make `errorOutPendingAdds` async, all the 10000 PendingAddOp will send requests to the server, because the Runnable is ordered. After 10000 PendingAddOp executing, the `errorOutPendingAdds` will be executed. It's not acceptable.
   
   @horizonzy Thank you very much for your careful explanation, your concern is very reasonable. However in the scenario you described, when bkclient writes multiple messages to the same ledger. If the first message fails, LedgerHandle makes subsequent requests fail by calling errorOutPendingAdds asynchronously.
   There is a difference between the implementation of ReadOnlyLedgeHandle and the implementation of LedgerHandle. The latter is a sync operation.
   The change of this pr is actually referring to the implementation of handleUnrecoverableErrorDuringAdd of LedgerHandle, according to the suggestion of @hangc0276 .
   Another fix I can think of is to interrupt the execution of safeRun when NotEnoughException is encountered, but this may require adding an exception signature to handleBookieFailure. I understand that this may violate the original design intention? Do you have any better suggestions for changes? Welcome to continue the discussion.
   
   It is worth mentioning that although LedgerHandle.handleUnrecoverableErrorDuringAdd may also call errorOutPendingAdds synchronously, the logic of this code will never happen, because the ledger recovery process will definitely call the ReadOnlyLedgeHandle method.
   
   https://github.com/apache/bookkeeper/blob/7b5b6b240ca7551da266903078f2dd2ba2906b96/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L1767-L1777


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org