You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2024/02/11 01:21:42 UTC

[I] [partial-upsert] Ensuring Data Consistency after Rebalance [pinot]

ankitsultana opened a new issue, #12400:
URL: https://github.com/apache/pinot/issues/12400

   ### Issue Description
   
   (I haven't spent time reproducing this. The following is based on my understanding.)
   
   Partial Upsert tables merge the incoming event with the existing latest version of the record.
   
   Say we have two replicas of consuming segments: S0 and S1.
   
   Say the segments with the previous sequence id for this segment in the replicas are: P0 and P1.
   
   While a rebalance is going on, say P0 gets moved to the target server before P1, and between that time we had a record come to S0 which needed to be read from P0.
   
   If a segment commit happens before the consuming segments were moved, we will end up with S0, S1 having different data.
   
   ### Discussion
   
   Are there any guardrails to prevent this?
   
   The main check I know we have is the `allSegmentsLoaded` check, but that is applicable only for new consuming segments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org