You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/28 05:39:27 UTC

[GitHub] [pulsar] michaeljmarshall opened a new pull request, #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

michaeljmarshall opened a new pull request, #18237:
URL: https://github.com/apache/pulsar/pull/18237

   Fixes: https://github.com/apache/pulsar/issues/18236
   
   ### Motivation
   
   See https://github.com/apache/pulsar/issues/18236 for details on the problematic behavior.
   
   The fundamental problem is that the cursor persists the `readPosition` instead of persisting the `markDeletePosition` in the `ManagedCursor#internalResetCursor` method. This method is only called on specific occasions, the broker persists the correct `markDeletePosition` in zookeeper, and the bookkeeper's cursor data is only used when the broker doesn't update the cursor in zookeeper, so this bug is rare.
   
   I found this bug at first by producing to a newly created topic and subscription where the subscription was set to "latest". In that case, when the broker evaluates the following code, the `position` evaluates to `ledgerId:0` instead of `ledgerId:-1`. It seems that this bug affects all reset cursor calls except "earliest" because of the way "earliest" always goes to entryId -1.
   
   https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1186-L1191
   
   After setting the position incorrectly for `latest` and for regular resets, the method persists `newPosition` (which is defined by `position`) as the `markDeletePosition`:
   
   https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1281
   
   As a result, the cursor's persisted data is incorrect for those two cases.
   
   Interestingly, we do not see this bug much because a cleanly closed cursor ledger persists its `markDeletePosition` in [Zookeeper](https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L2525), and when the broker recovers the cursor, it just uses the zookeeper data:
   
   https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L458
   
   Note that the only reason the zk metadata is correct is because we update the `markDeletePosition` to a new value in a callback run _after_ persisting a different `markDeletePosition`:
   
   https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1217
   
   Note also that this bug does not occur unless you produce a message to the topic after triggering the `resetCursor` logic. If you do not produce a message, you'll see a log line like `Current position 4:0 is ahead of last position 4:-1`, which was introduced by https://github.com/apache/pulsar/pull/2673. It's possible that PR partially found this bug.
   
   Finally, this bug may be related to https://github.com/apache/pulsar/pull/15031 https://github.com/apache/pulsar/pull/15067.
   
   ### Modifications
   
   * Update `ManagedCursor#internalResetCursor` so that it persists the `markDeletePosition` instead of the `readPosition` to the cursor's bookkeeper ledger. The rest of the changes make sure that the correct variables are shared around.
   * Improve logging to make it clearer that `resetCursor` updates the `readPosition`
   * Fix test assertion ordering for semi-related test (this test failed as I iterated over the solution)
   
   ### Verifying this change
   
   This PR includes two new tests and all current tests continue to pass.
   
   ### Does this pull request potentially affect one of the following parts:
   
   The only possible "breaking" change is that this PR asserts that the `resetCursor` logic resets the read position (not the mark delete position). I think this is very sensible, but I'll need verification. Note that we already follow this logic in the way that we update the `markDeletePosition` in memory and in zookeeper. This change just updates what gets written to the cursor's ledger.
   
   ### Documentation
   
   - [x] `doc`
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/michaeljmarshall/pulsar/pull/7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- merged pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
Technoboy- merged PR #18237:
URL: https://github.com/apache/pulsar/pull/18237


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] AnonHxy commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1294604763

   It seems that this PR some the same problem https://github.com/apache/pulsar/pull/17668  cc @HQebupt 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1294641018

   Sorry we missed your PR before @AnonHxy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] michaeljmarshall commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1320805798

   Thanks for cherry picking it @congbobo184!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] congbobo184 commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1314660373

   could you please cherry-pick this PR to branch-2.9? thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codecov-commenter commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1295227764

   # [Codecov](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#18237](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4c61048) into [master](https://codecov.io/gh/apache/pulsar/commit/6c65ca0d8a80bfaaa4d5869e0cea485f5c94369b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6c65ca0) will **increase** coverage by `3.69%`.
   > The diff coverage is `36.62%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pulsar/pull/18237/graphs/tree.svg?width=650&height=150&src=pr&token=acYqCpsK9J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #18237      +/-   ##
   ============================================
   + Coverage     34.91%   38.60%   +3.69%     
   - Complexity     5707     8220    +2513     
   ============================================
     Files           607      683      +76     
     Lines         53396    67289   +13893     
     Branches       5712     7218    +1506     
   ============================================
   + Hits          18644    25979    +7335     
   - Misses        32119    38321    +6202     
   - Partials       2633     2989     +356     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `38.60% <36.62%> (+3.69%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...keeper/mledger/impl/ReadOnlyManagedLedgerImpl.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFuYWdlZC1sZWRnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2Jvb2trZWVwZXIvbWxlZGdlci9pbXBsL1JlYWRPbmx5TWFuYWdlZExlZGdlckltcGwuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...pache/bookkeeper/mledger/offload/OffloadUtils.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFuYWdlZC1sZWRnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2Jvb2trZWVwZXIvbWxlZGdlci9vZmZsb2FkL09mZmxvYWRVdGlscy5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [.../main/java/org/apache/pulsar/PulsarStandalone.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL1B1bHNhclN0YW5kYWxvbmUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../apache/pulsar/broker/admin/impl/ClustersBase.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9hZG1pbi9pbXBsL0NsdXN0ZXJzQmFzZS5qYXZh) | `76.66% <0.00%> (+67.62%)` | :arrow_up: |
   | [...pache/pulsar/broker/admin/impl/NamespacesBase.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9hZG1pbi9pbXBsL05hbWVzcGFjZXNCYXNlLmphdmE=) | `63.32% <ø> (+48.79%)` | :arrow_up: |
   | [.../pulsar/broker/service/AbstractBaseDispatcher.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0Fic3RyYWN0QmFzZURpc3BhdGNoZXIuamF2YQ==) | `50.92% <0.00%> (+5.06%)` | :arrow_up: |
   | [...che/pulsar/broker/service/BacklogQuotaManager.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0JhY2tsb2dRdW90YU1hbmFnZXIuamF2YQ==) | `12.39% <0.00%> (+2.91%)` | :arrow_up: |
   | [.../pulsar/broker/service/BrokerServiceException.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0Jyb2tlclNlcnZpY2VFeGNlcHRpb24uamF2YQ==) | `38.88% <0.00%> (+13.88%)` | :arrow_up: |
   | [...ava/org/apache/pulsar/broker/service/Consumer.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0NvbnN1bWVyLmphdmE=) | `68.08% <0.00%> (+6.07%)` | :arrow_up: |
   | [...ava/org/apache/pulsar/broker/service/Producer.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL1Byb2R1Y2VyLmphdmE=) | `62.42% <0.00%> (+2.78%)` | :arrow_up: |
   | ... and [712 more](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss

Posted by GitBox <gi...@apache.org>.
Technoboy- commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1296461848

   > It seems that this PR some the same problem #17668 cc @HQebupt
   
   @HQebupt 
   Do you agree to use this pr and close #17668, as this pr has added the test to avoid the regression?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org