You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/09 00:14:42 UTC

[GitHub] [pinot] Jackie-Jiang commented on issue #7849: Inconsistent Row Counts from Upsert Tables

Jackie-Jiang commented on issue #7849:
URL: https://github.com/apache/pinot/issues/7849#issuecomment-1062417937


   @tuor713 Within a single streaming partition, there will be up to one consuming segment at the same time. The small inconsistency is caused by the consuming segment replacing the doc from the completed segments, and the `validDocIds` from the segments are not read at the same time, e.g.:
   1. Query engine reads `validDocIds` from the completed segment (created a copy)
   2. Consuming segment invalidates one doc from the completed segment (not visible to the query engine because a copy/snapshot is already made), and mark the doc as valid in its own `validDocIds`
   3. Query engine reads `validDocIds` from the consuming segment
   4. The same doc will be double counted
   
   In order to solve this problem, we need to make global sync - take a snapshot of all queried segments while blocking the ingestion (as shown in the fix above). The solution works, and we can avoid creating the extra `IndexSegment` snapshot objects by snapshotting the `validDocIds` within the `FilterPlanNode`, but it can cause starvation between query and ingestion. For high QPS use case, the query can block each other, and also the ingestion.
   
   We can make it configurable for use cases that requires 100% consistency, but 100% consistency is usually not necessary for analytical purpose. Essentially it is a trade-off between consistency and performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org