You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2024/02/11 00:41:33 UTC

[I] [partial-upsert] Bug in Handling Comparison Column Ties When Replacing Segment [pinot]

ankitsultana opened a new issue, #12398:
URL: https://github.com/apache/pinot/issues/12398

   ### Issue Description
   
   (Assume no sorted column is set)
   
   Say we have two records for a primary-key in a consuming segment: `R0, R1` (in Kafka order). Say both of these had the same comparison column value.
   
   After ingesting these events, the metadata manager map will point to R1 as being the latest record (which is correct).
   
   Then, during segment commit when the ImmutableSegment replaces the MutableSegment, we will end up pointing the upsert metadata manager map to the older record `R0` for a brief moment, until the iterator reaches `R1`.
   
   But when the `ConcurrentMapPartitionUpsertMetadataManager#addOrReplaceSegment` method is being called, the consuming segment may have already started consumption. If an event for the same primary key comes when the map was briefly pointing to `R0`, then the merged record will have *wrong* data.
   
   Since this is a race-condition, this can manifest differently across the replicas and lead to even more issues. (use issue TBD to discuss that).
   
   ### How This was Discovered
   
   I was trying to debug some other Partial Upsert issue, and had created a table in one of our clusters. I force committed a few times in an hour to get a bunch of segments, and was surprised to see that the replicas of the segments had diverged. This particular table keeps the comparison column values as 0.
   
   On taking a deeper look I found these records.
   
   * Correct data: https://gist.github.com/ankitsultana/280f6fbcb704f8305359e002055a83b8
   * Incorrect data: https://gist.github.com/ankitsultana/4459d1dcd1ecc43bad7b4636b814a306
   
   ### Possible Fix
   
   One possible fix is that for Partial Upsert tables, we compare the docIds when the comparison column values match. However, this won't work when users have set a sorted column. I have created #12397 to discuss the sorted column issue separately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [partial-upsert] Bug in Handling Comparison Column Ties When Replacing Segment [pinot]

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on issue #12398:
URL: https://github.com/apache/pinot/issues/12398#issuecomment-1972554920

   Closing this since this was resolved by #12395.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [partial-upsert] Bug in Handling Comparison Column Ties When Replacing Segment [pinot]

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana closed issue #12398: [partial-upsert] Bug in Handling Comparison Column Ties When Replacing Segment
URL: https://github.com/apache/pinot/issues/12398


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org