You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2024/02/10 23:32:38 UTC
[PR] [Do Not Merge] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
ankitsultana opened a new pull request, #12395:
URL: https://github.com/apache/pinot/pull/12395
See issue: tbd
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] [Do Not Merge] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12395:
URL: https://github.com/apache/pinot/pull/12395#issuecomment-1937362625
## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
Attention: `1 lines` in your changes are missing coverage. Please review.
> Comparison is base [(`5c81c0a`)](https://app.codecov.io/gh/apache/pinot/commit/5c81c0a88a0a95e5bf150363bb3ed5b5203de795?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.56% compared to head [(`6ca5e37`)](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 46.70%.
> Report is 14 commits behind head on master.
| [Files](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
|---|---|---|
| [...t/ConcurrentMapPartitionUpsertMetadataManager.java](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91cHNlcnQvQ29uY3VycmVudE1hcFBhcnRpdGlvblVwc2VydE1ldGFkYXRhTWFuYWdlci5qYXZh) | 0.00% | [1 Missing :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
<details><summary>Additional details and impacted files</summary>
```diff
@@ Coverage Diff @@
## master #12395 +/- ##
=============================================
- Coverage 61.56% 46.70% -14.87%
- Complexity 201 915 +714
=============================================
Files 2426 1821 -605
Lines 132709 96282 -36427
Branches 20519 15573 -4946
=============================================
- Hits 81706 44964 -36742
- Misses 44994 48087 +3093
+ Partials 6009 3231 -2778
```
| [Flag](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
|---|---|---|
| [integration](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [integration2](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [java-11](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [java-21](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.70% <0.00%> (-14.87%)` | :arrow_down: |
| [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.70% <0.00%> (-14.85%)` | :arrow_down: |
| [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
| [temurin](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.70% <0.00%> (-14.87%)` | :arrow_down: |
| [unittests](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.70% <0.00%> (-14.87%)` | :arrow_down: |
| [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.70% <0.00%> (-0.03%)` | :arrow_down: |
| [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12395/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
</details>
[:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12395?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
:loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1496367947
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -601,6 +604,46 @@ protected void handleOutOfOrderEvent(Object currentComparisonValue, Object recor
}
}
+ /**
+ * When we have to process a new segment, if there are comparison value ties for the same primary-key within the
+ * segment, then for Partial Upsert tables we need to make sure that the record location map is updated only
+ * for the latest version of the record. This is specifically a concern for Partial Upsert tables because Realtime
Review Comment:
Version is computed using comparison column values. In case of ties, the latest record is determined to be of the latest version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "deemoliu (via GitHub)" <gi...@apache.org>.
deemoliu commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1496151293
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -601,6 +604,46 @@ protected void handleOutOfOrderEvent(Object currentComparisonValue, Object recor
}
}
+ /**
+ * When we have to process a new segment, if there are comparison value ties for the same primary-key within the
+ * segment, then for Partial Upsert tables we need to make sure that the record location map is updated only
+ * for the latest version of the record. This is specifically a concern for Partial Upsert tables because Realtime
Review Comment:
where the version come from?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1492976184
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -69,6 +69,9 @@ protected void addOrReplaceSegment(ImmutableSegmentImpl segment, ThreadSafeMutab
String segmentName = segment.getSegmentName();
segment.enableUpsert(this, validDocIds, queryableDocIds);
+ if (_partialUpsertHandler != null) {
+ recordInfoIterator = resolveComparisonTies(recordInfoIterator, _hashFunction);
+ }
Review Comment:
@Jackie-Jiang : this small refactor suggestion looks good to me. Though I'd probably change the name a bit. Since by usual naming conventions, a method like `addOrReplaceSegment` would be the one calling `doAddOrReplaceSegment`.
lmk your thoughts.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12395:
URL: https://github.com/apache/pinot/pull/12395
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1492896625
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -69,6 +69,9 @@ protected void addOrReplaceSegment(ImmutableSegmentImpl segment, ThreadSafeMutab
String segmentName = segment.getSegmentName();
segment.enableUpsert(this, validDocIds, queryableDocIds);
+ if (_partialUpsertHandler != null) {
+ recordInfoIterator = resolveComparisonTies(recordInfoIterator, _hashFunction);
+ }
Review Comment:
We can move this to Base class too. But i see we will have to make add this logic in multiple methods. Ideally we should have a base method implementation doing this (something like `doAddOrReplaceSegment` and then calling `addOrReplaceSegment` here.
Maybe we can take the refactoring in a follow-up PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "tibrewalpratik17 (via GitHub)" <gi...@apache.org>.
tibrewalpratik17 commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1492024212
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -167,6 +172,45 @@ protected void addSegmentWithoutUpsert(ImmutableSegmentImpl segment, ThreadSafeM
}
}
+ /**
+ * When we have to process a new segment, if there are comparison value ties for the same primary-key within the
+ * segment, then for Partial Upsert tables we need to make sure that the record location map is updated only
+ * for the latest version of the record. This is specifically a concern for Partial Upsert tables because Realtime
+ * consumption can potentially end up reading the wrong version of a record, which will lead to permanent
+ * data-inconsistency.
+ *
+ * <p>
+ * This function returns an iterator that will de-dup
+ * records
+ * with the same primary-key. Moreover, for comparison ties, it will only keep the latest record. This iterator can
+ * then further be used to update the primary-key record location map safely.
+ * </p>
+ *
+ * @param recordInfoIterator iterator over the new segment
+ * @return iterator that returns de-duplicated records. To resolve ties for comparison column values, we prefer to
+ * return the latest record.
+ */
+ @VisibleForTesting
+ protected Iterator<RecordInfo> resolveComparisonTies(Iterator<RecordInfo> recordInfoIterator) {
+ Map<Object, RecordInfo> deDuplicatedRecordInfo = new HashMap<>();
+ while (recordInfoIterator.hasNext()) {
+ RecordInfo recordInfo = recordInfoIterator.next();
+ Comparable newComparisonValue = recordInfo.getComparisonValue();
+ deDuplicatedRecordInfo.compute(HashUtils.hashPrimaryKey(recordInfo.getPrimaryKey(), _hashFunction),
+ (key, existingRecordInfo) -> {
Review Comment:
can we rename `existingRecordInfo` --> `maxComparisonValueRecordInfo` for more clarity?
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -69,6 +71,9 @@ protected void addOrReplaceSegment(ImmutableSegmentImpl segment, ThreadSafeMutab
String segmentName = segment.getSegmentName();
segment.enableUpsert(this, validDocIds, queryableDocIds);
+ if (_partialUpsertHandler != null) {
+ recordInfoIterator = resolveComparisonTies(recordInfoIterator);
+ }
Review Comment:
can we move this to the base class? we would want to do this in all scenarios.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
Re: [PR] Fix Bug in Handling Equal Comparison Column Values in Upsert [pinot]
Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on code in PR #12395:
URL: https://github.com/apache/pinot/pull/12395#discussion_r1496367947
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -601,6 +604,46 @@ protected void handleOutOfOrderEvent(Object currentComparisonValue, Object recor
}
}
+ /**
+ * When we have to process a new segment, if there are comparison value ties for the same primary-key within the
+ * segment, then for Partial Upsert tables we need to make sure that the record location map is updated only
+ * for the latest version of the record. This is specifically a concern for Partial Upsert tables because Realtime
Review Comment:
Version is computed using comparison column values. In case of ties, the latest record is determined to be of the latest version. (this PR doesn't change the existing definitions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org