You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/06/08 07:23:55 UTC

[GitHub] [pulsar] lhotari opened a new pull request, #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

lhotari opened a new pull request, #15971:
URL: https://github.com/apache/pulsar/pull/15971

   ### Motivation
   
   The broker can dead lock when skipping the updating of mark delete position happens. PR #15067 added the logic to skip updating mark delete position if there's already a later mark delete position. That logic introduced this particular deadlock issue.
   
   Example stack traces of a deadlock (lines of code from fork based on branch-2.8): 
   
   ```
   "BookKeeperClientWorker-OrderedExecutor-5-0" Id=59 in BLOCKED on lock=org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers@5592e5cc
        owned by pulsar-io-4-6 Id=168
       at app//org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers.markDeletePositionMoveForward(PersistentStickyKeyDispatcherMultipleConsumers.java:395)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.notifyTheMarkDeletePositionMoveForwardIfNeeded(PersistentSubscription.java:552)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.access$500(PersistentSubscription.java:86)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription$3.deleteComplete(PersistentSubscription.java:539)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$19.markDeleteComplete(ManagedCursorImpl.java:2122)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$MarkDeleteEntry.triggerComplete(ManagedCursorImpl.java:234)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$17.operationComplete(ManagedCursorImpl.java:1896)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.lambda$persistPositionToLedger$25(ManagedCursorImpl.java:2712)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$$Lambda$761/0x0000000840837840.addComplete(Unknown Source)
       at app//org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:92)
       at app//org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:431)
       at app//org.apache.bookkeeper.client.LedgerHandle.sendAddSuccessCallbacks(LedgerHandle.java:1832)
       at app//org.apache.bookkeeper.client.PendingAddOp.sendAddSuccessCallbacks(PendingAddOp.java:415)
       at app//org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:409)
       at app//org.apache.bookkeeper.proto.PerChannelBookieClient$AddCompletion.writeComplete(PerChannelBookieClient.java:2151)
   
   "pulsar-io-4-6" Id=168 in BLOCKED on lock=java.util.ArrayDeque@4efce20f
        owned by pulsar-io-4-8 Id=170
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.readOperationCompleted(ManagedCursorImpl.java:2848)
       at app//org.apache.bookkeeper.mledger.impl.OpReadEntry.checkReadCompletion(OpReadEntry.java:152)
       at app//org.apache.bookkeeper.mledger.impl.OpReadEntry.readEntriesComplete(OpReadEntry.java:87)
       at app//org.apache.bookkeeper.mledger.impl.EntryCacheImpl.asyncReadEntry0(EntryCacheImpl.java:293)
       at app//org.apache.bookkeeper.mledger.impl.EntryCacheImpl.asyncReadEntry(EntryCacheImpl.java:251)
       at app//org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncReadEntry(ManagedLedgerImpl.java:1935)
       at app//org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.internalReadFromLedger(ManagedLedgerImpl.java:1907)
       at app//org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.asyncReadEntries(ManagedLedgerImpl.java:1707)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.asyncReadEntries(ManagedCursorImpl.java:659)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.asyncReadEntries(ManagedCursorImpl.java:642)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.asyncReadEntriesOrWait(ManagedCursorImpl.java:793)
       at app//org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers.readMoreEntries(PersistentDispatcherMultipleConsumers.java:273)
         - locked org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers@5592e5cc
       at app//org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers.consumerFlow(PersistentDispatcherMultipleConsumers.java:222)
         - locked org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers@5592e5cc
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.consumerFlow(PersistentSubscription.java:369)
       at app//org.apache.pulsar.broker.service.Consumer.flowPermits(Consumer.java:645)
       at app//org.apache.pulsar.broker.service.ServerCnx.handleFlow(ServerCnx.java:1466)
       at app//org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:181)
   
   "pulsar-io-4-8" Id=170 in BLOCKED on lock=org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers@5592e5cc
        owned by pulsar-io-4-6 Id=168
       at app//org.apache.pulsar.broker.service.persistent.PersistentStickyKeyDispatcherMultipleConsumers.markDeletePositionMoveForward(PersistentStickyKeyDispatcherMultipleConsumers.java:395)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.notifyTheMarkDeletePositionMoveForwardIfNeeded(PersistentSubscription.java:552)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.access$500(PersistentSubscription.java:86)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription$3.deleteComplete(PersistentSubscription.java:539)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$19.markDeleteComplete(ManagedCursorImpl.java:2122)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$MarkDeleteEntry.triggerComplete(ManagedCursorImpl.java:234)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.internalMarkDelete(ManagedCursorImpl.java:1847)
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.internalAsyncMarkDelete(ManagedCursorImpl.java:1810)
         - locked java.util.ArrayDeque@4efce20f
       at app//org.apache.bookkeeper.mledger.impl.ManagedCursorImpl.asyncDelete(ManagedCursorImpl.java:2119)
       at app//org.apache.pulsar.broker.service.persistent.PersistentSubscription.acknowledgeMessage(PersistentSubscription.java:394)
       at app//org.apache.pulsar.broker.service.Consumer.individualAckNormal(Consumer.java:442)
       at app//org.apache.pulsar.broker.service.Consumer.messageAcked(Consumer.java:392)
       at app//org.apache.pulsar.broker.service.ServerCnx.handleAck(ServerCnx.java:1437)
       at app//org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:144)
   
   ```
   
   ### Modifications
   
   - when skipping updating mark delete position, execute callback with executor to prevent deadlock


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #15971:
URL: https://github.com/apache/pulsar/pull/15971#discussion_r892045582


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java:
##########
@@ -1896,7 +1896,8 @@ void internalMarkDelete(final MarkDeleteEntry mdEntry) {
                 log.info("Skipping updating mark delete position to {}. The persisted mark delete position {} "
                         + "is later.", mdEntry.newPosition, persistentMarkDeletePosition);
             }
-            mdEntry.triggerComplete();
+            // run with executor to prevent deadlock
+            ledger.getExecutor().executeOrdered(ledger.getName(), safeRun(() -> mdEntry.triggerComplete()));

Review Comment:
   Yes, it happened in a different thread that will not hold the `pendingMarkDeleteOps` lock, so will not encounter the deadlock.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli merged pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
eolivelli merged PR #15971:
URL: https://github.com/apache/pulsar/pull/15971


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] nicoloboschi commented on pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
nicoloboschi commented on PR #15971:
URL: https://github.com/apache/pulsar/pull/15971#issuecomment-1149822208

   /pulsarbot rerun-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15971:
URL: https://github.com/apache/pulsar/pull/15971#issuecomment-1149565500

   @lhotari:Thanks for providing doc info!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] eolivelli commented on a diff in pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
eolivelli commented on code in PR #15971:
URL: https://github.com/apache/pulsar/pull/15971#discussion_r892017626


##########
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java:
##########
@@ -1896,7 +1896,8 @@ void internalMarkDelete(final MarkDeleteEntry mdEntry) {
                 log.info("Skipping updating mark delete position to {}. The persisted mark delete position {} "
                         + "is later.", mdEntry.newPosition, persistentMarkDeletePosition);
             }
-            mdEntry.triggerComplete();
+            // run with executor to prevent deadlock
+            ledger.getExecutor().executeOrdered(ledger.getName(), safeRun(() -> mdEntry.triggerComplete()));

Review Comment:
   I think that this behaviour is consistent with what happens a few lines below in the operationComplete() callback passed to `persistPositionToLedger`:` mdEntry.triggerComplete();` is not executed on this thread immediately but later



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on pull request #15971: [ML] When skipping updating mark delete position, execute callback with executor to prevent deadlock

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15971:
URL: https://github.com/apache/pulsar/pull/15971#issuecomment-1149558351

   @lhotari:Thanks for your contribution. For this PR, do we need to update docs?
   (The [PR template contains info about doc](https://github.com/apache/pulsar/blob/master/.github/PULL_REQUEST_TEMPLATE.md#documentation), which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org