You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Jackie-Jiang (via GitHub)" <gi...@apache.org> on 2023/07/06 20:10:27 UTC

[GitHub] [pinot] Jackie-Jiang opened a new issue, #11045: Re-design dedup to not reuse upsert mechanism

Jackie-Jiang opened a new issue, #11045:
URL: https://github.com/apache/pinot/issues/11045

   Here are some of the main differences between dedup and upsert:
   - Dedup is done when ingesting data from the stream (apply to consuming segment only), and no need to track valid docs. The duplicate records are simply dropped
   - Dedup window (TTL of the metadata) is a must have to reduce the metadata size
   - There is no need to track the record location in the dedup metadata. We do want to track timestamp for the dedup window
   
   One potential solution for the dedup window is to keep 2 rotating maps, each storing metadata for one dedup window, and once the old map is completely out of the dedup window, clear it and use it as the new map.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #11045: Re-design dedup to not reuse upsert mechanism

Posted by "atris (via GitHub)" <gi...@apache.org>.
atris commented on issue #11045:
URL: https://github.com/apache/pinot/issues/11045#issuecomment-1644464140

   I am currently investigating this ticket and working on an initial design document -- will be updating the ticket pretty soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org