You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by "M1eyu2018 (via GitHub)" <gi...@apache.org> on 2024/04/02 09:14:50 UTC

[I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

M1eyu2018 opened a new issue, #4261:
URL: https://github.com/apache/bookkeeper/issues/4261

   **BUG REPORT**
   When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck.
   
   ***Describe the bug***
   
   When a client adds entries synchronously to an opened ledger and a bookie crashes, the ensemble change for the crashed bookie may be called twice.
   The first ensemble change is caused by the third failed response of 'Bookie handle was not available'.
   A moment later, The Second ensemble change is caused by the third failed response of 'Bookie operation timeout'.
   As the same crashed bookie is replaced twice, in the second time unsetSuccessAndSendWriteRequest can't be called because no bookie is replaced so that successful callback of current adding entry can't be sent and client gets stuck.
   
   ***Example***
   In this example, a client add 81920 entries for a ledger of 10M with 3-3-2 policy, and the ensemble is (A,B,C).
   1、At the beginning,entry#0-#6773 is normally written.
   2、When add entry#6774, the bookie A crashes for some reason like power outage or run 'kill -9 bookie A process id'.
   3、However, two successful responses are received, so it does not affect the ability to continue adding entry#6774-#11604.
   4、Before add entry#11605, the third responses for entry#6774-#11604 come back one after another. As the failed response is 'Bookie handle was not available', the failed bookie A is put into delayedWriteFailedBookies.
   5、When add entry#11605, maybeHandleDelayedWriteBookieFailure is called, as delayedWriteFailedBookies is not empty, ensemble change begins.
   6、After two successful responses of entry#11605 are received, sendAddSuccessCallbacks is called. However, pendingAddOp.submitCallback is not called until ensemble change finishes.
   7、When ensemble change finishes, bookie A is replaced by bookie D. Successful callback of entry#11605 is also sent and adding entry is continue.
   
   So far, the logic is correct. But there will be a problem below.
   
   8、entry#11606-#42623 is normally written to (D,B,C) after ensemble change.
   9、Before add entry #42624, the third responses for entry#6774-#11604 which has not come back still come back one after another. But in this time, the failed response is 'Bookie operation timeout', the failed bookie A is put into delayedWriteFailedBookies again.
   10、When add entry#426245, maybeHandleDelayedWriteBookieFailure is called, as delayedWriteFailedBookies is not empty, ensemble change begin again.
   11、After three successful responses of entry#426245 from (D,B,C) are received, sendAddSuccessCallbacks is called. However, pendingAddOp.submitCallback is not called until ensemble change finishes.
   12、In this time, as failed bookie A need to be replaced again, but ensemble has been (D,B,C), so no bookie is replaced. Successful callback of entry#426245 can't be sent as unsetSuccessAndSendWriteRequest is not called.
   13、As add entries synchronously, the client gets stuck.
   
   
   ***To Reproduce***
   
   1、create bookkeeper client
   2、open a ledger
   3、add entries synchronously
   4、kill -9  one bookie process id when add entries
   5、the client may get stuck forever
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on issue #4261:
URL: https://github.com/apache/bookkeeper/issues/4261#issuecomment-2041342313

   Thanks for report, I will check it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
lhotari commented on issue #4261:
URL: https://github.com/apache/bookkeeper/issues/4261#issuecomment-2058820103

   Is this similar or related to #4097?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

Posted by "thetumbled (via GitHub)" <gi...@apache.org>.
thetumbled commented on issue #4261:
URL: https://github.com/apache/bookkeeper/issues/4261#issuecomment-2031555670

   PTAL, thanks. @hangc0276 @ivankelly @horizonzy @shoothzj @wenbingshen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

Posted by "horizonzy (via GitHub)" <gi...@apache.org>.
horizonzy commented on issue #4261:
URL: https://github.com/apache/bookkeeper/issues/4261#issuecomment-2046825660

   Nice catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When a client adds entries synchronously to an opened ledger and a bookie crashes, the client may get stuck. [bookkeeper]

Posted by "wenbingshen (via GitHub)" <gi...@apache.org>.
wenbingshen commented on issue #4261:
URL: https://github.com/apache/bookkeeper/issues/4261#issuecomment-2051187790

   Nice Catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org