You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/28 05:39:27 UTC
[GitHub] [pulsar] michaeljmarshall opened a new pull request, #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
michaeljmarshall opened a new pull request, #18237:
URL: https://github.com/apache/pulsar/pull/18237
Fixes: https://github.com/apache/pulsar/issues/18236
### Motivation
See https://github.com/apache/pulsar/issues/18236 for details on the problematic behavior.
The fundamental problem is that the cursor persists the `readPosition` instead of persisting the `markDeletePosition` in the `ManagedCursor#internalResetCursor` method. This method is only called on specific occasions, the broker persists the correct `markDeletePosition` in zookeeper, and the bookkeeper's cursor data is only used when the broker doesn't update the cursor in zookeeper, so this bug is rare.
I found this bug at first by producing to a newly created topic and subscription where the subscription was set to "latest". In that case, when the broker evaluates the following code, the `position` evaluates to `ledgerId:0` instead of `ledgerId:-1`. It seems that this bug affects all reset cursor calls except "earliest" because of the way "earliest" always goes to entryId -1.
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1186-L1191
After setting the position incorrectly for `latest` and for regular resets, the method persists `newPosition` (which is defined by `position`) as the `markDeletePosition`:
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1281
As a result, the cursor's persisted data is incorrect for those two cases.
Interestingly, we do not see this bug much because a cleanly closed cursor ledger persists its `markDeletePosition` in [Zookeeper](https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L2525), and when the broker recovers the cursor, it just uses the zookeeper data:
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L458
Note that the only reason the zk metadata is correct is because we update the `markDeletePosition` to a new value in a callback run _after_ persisting a different `markDeletePosition`:
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1217
Note also that this bug does not occur unless you produce a message to the topic after triggering the `resetCursor` logic. If you do not produce a message, you'll see a log line like `Current position 4:0 is ahead of last position 4:-1`, which was introduced by https://github.com/apache/pulsar/pull/2673. It's possible that PR partially found this bug.
Finally, this bug may be related to https://github.com/apache/pulsar/pull/15031 https://github.com/apache/pulsar/pull/15067.
### Modifications
* Update `ManagedCursor#internalResetCursor` so that it persists the `markDeletePosition` instead of the `readPosition` to the cursor's bookkeeper ledger. The rest of the changes make sure that the correct variables are shared around.
* Improve logging to make it clearer that `resetCursor` updates the `readPosition`
* Fix test assertion ordering for semi-related test (this test failed as I iterated over the solution)
### Verifying this change
This PR includes two new tests and all current tests continue to pass.
### Does this pull request potentially affect one of the following parts:
The only possible "breaking" change is that this PR asserts that the `resetCursor` logic resets the read position (not the mark delete position). I think this is very sensible, but I'll need verification. Note that we already follow this logic in the way that we update the `markDeletePosition` in memory and in zookeeper. This change just updates what gets written to the cursor's ledger.
### Documentation
- [x] `doc`
### Matching PR in forked repository
PR in forked repository: https://github.com/michaeljmarshall/pulsar/pull/7
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- merged pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
Technoboy- merged PR #18237:
URL: https://github.com/apache/pulsar/pull/18237
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] AnonHxy commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
AnonHxy commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1294604763
It seems that this PR some the same problem https://github.com/apache/pulsar/pull/17668 cc @HQebupt
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codelipenghui commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1294641018
Sorry we missed your PR before @AnonHxy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] michaeljmarshall commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1320805798
Thanks for cherry picking it @congbobo184!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] congbobo184 commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
congbobo184 commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1314660373
could you please cherry-pick this PR to branch-2.9? thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] codecov-commenter commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1295227764
# [Codecov](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#18237](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4c61048) into [master](https://codecov.io/gh/apache/pulsar/commit/6c65ca0d8a80bfaaa4d5869e0cea485f5c94369b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6c65ca0) will **increase** coverage by `3.69%`.
> The diff coverage is `36.62%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pulsar/pull/18237/graphs/tree.svg?width=650&height=150&src=pr&token=acYqCpsK9J&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #18237 +/- ##
============================================
+ Coverage 34.91% 38.60% +3.69%
- Complexity 5707 8220 +2513
============================================
Files 607 683 +76
Lines 53396 67289 +13893
Branches 5712 7218 +1506
============================================
+ Hits 18644 25979 +7335
- Misses 32119 38321 +6202
- Partials 2633 2989 +356
```
| Flag | Coverage Δ | |
|---|---|---|
| unittests | `38.60% <36.62%> (+3.69%)` | :arrow_up: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pulsar/pull/18237?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [...keeper/mledger/impl/ReadOnlyManagedLedgerImpl.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFuYWdlZC1sZWRnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2Jvb2trZWVwZXIvbWxlZGdlci9pbXBsL1JlYWRPbmx5TWFuYWdlZExlZGdlckltcGwuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [...pache/bookkeeper/mledger/offload/OffloadUtils.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-bWFuYWdlZC1sZWRnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2Jvb2trZWVwZXIvbWxlZGdlci9vZmZsb2FkL09mZmxvYWRVdGlscy5qYXZh) | `0.00% <0.00%> (ø)` | |
| [.../main/java/org/apache/pulsar/PulsarStandalone.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL1B1bHNhclN0YW5kYWxvbmUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
| [.../apache/pulsar/broker/admin/impl/ClustersBase.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9hZG1pbi9pbXBsL0NsdXN0ZXJzQmFzZS5qYXZh) | `76.66% <0.00%> (+67.62%)` | :arrow_up: |
| [...pache/pulsar/broker/admin/impl/NamespacesBase.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9hZG1pbi9pbXBsL05hbWVzcGFjZXNCYXNlLmphdmE=) | `63.32% <ø> (+48.79%)` | :arrow_up: |
| [.../pulsar/broker/service/AbstractBaseDispatcher.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0Fic3RyYWN0QmFzZURpc3BhdGNoZXIuamF2YQ==) | `50.92% <0.00%> (+5.06%)` | :arrow_up: |
| [...che/pulsar/broker/service/BacklogQuotaManager.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0JhY2tsb2dRdW90YU1hbmFnZXIuamF2YQ==) | `12.39% <0.00%> (+2.91%)` | :arrow_up: |
| [.../pulsar/broker/service/BrokerServiceException.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0Jyb2tlclNlcnZpY2VFeGNlcHRpb24uamF2YQ==) | `38.88% <0.00%> (+13.88%)` | :arrow_up: |
| [...ava/org/apache/pulsar/broker/service/Consumer.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL0NvbnN1bWVyLmphdmE=) | `68.08% <0.00%> (+6.07%)` | :arrow_up: |
| [...ava/org/apache/pulsar/broker/service/Producer.java](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cHVsc2FyLWJyb2tlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcHVsc2FyL2Jyb2tlci9zZXJ2aWNlL1Byb2R1Y2VyLmphdmE=) | `62.42% <0.00%> (+2.78%)` | :arrow_up: |
| ... and [712 more](https://codecov.io/gh/apache/pulsar/pull/18237/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [pulsar] Technoboy- commented on pull request #18237: [fix][ml] Persist correct markDeletePosition to prevent message loss
Posted by GitBox <gi...@apache.org>.
Technoboy- commented on PR #18237:
URL: https://github.com/apache/pulsar/pull/18237#issuecomment-1296461848
> It seems that this PR some the same problem #17668 cc @HQebupt
@HQebupt
Do you agree to use this pr and close #17668, as this pr has added the test to avoid the regression?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org