You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "atris (via GitHub)" <gi...@apache.org> on 2023/05/01 02:54:59 UTC

[GitHub] [pinot] atris commented on pull request #10702: Support Deletes In Upserts

atris commented on PR #10702:
URL: https://github.com/apache/pinot/pull/10702#issuecomment-1529282705

   
   > This is not what I said.. I said not to directly close this PR and see if it is possible to solve the problems, and if @navina cannot get an implementation sticking with your timeline we can go with this one. 
   
   Sure, I should have been more verbose, my bad.
   
   >Since Navina already gets an implementation #10703 and I feel that can solve all the existing problem with the overhead of keeping 2 bitmaps per segment, I'm leaning towards that solution as I don't see a good way to solve the snapshot issue with one single bitmap. The main problem with one single bitmap is that it breaks the contract that all the keys in the upsert metadata should point to a valid doc in the segment, and if we break that contract a lot of things could go wrong.
   
   Could you please elaborate a bit more on where things could break if we had keys in upsert metadata but not in valid doc IDs?
   
   My understanding is that the two places where things will break are:
   
   1. Loading a segment because the deleted doc IDs will be ignored during segment reload and hence will no longer be queryable.
   2. Deleting a segment because the snapshot will ignore the deleted docIDs.
   
   I do not see anything breaking at the runtime because the access pattern defines that the upsert metadata dictates which keys to be looked up in valid doc IDs -- and since the upsert metadata has the superset of docIDs, it should not lose any data. I think this is fortified by the fact that this PR makes the change and there are no tests breaking.
   
   Please provide your inputs
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org