You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "Jackie-Jiang (via GitHub)" <gi...@apache.org> on 2023/04/08 00:15:47 UTC

[GitHub] [pinot] Jackie-Jiang commented on pull request #10572: Use disk based key value store for deduplication

Jackie-Jiang commented on PR #10572:
URL: https://github.com/apache/pinot/pull/10572#issuecomment-1500737043

   Currently the deduplication is handled using the same way as upsert as a short term solution (not production ready). We have done a lot of bugfixes to the upsert implementation, but not actively maintain the dedup implementation.
   
   My suggestion would be to redesign dedup from scratch since it is not the same as upsert (no need to maintain valid docs, no need to track segment etc.), and TTL (dedup window) should be a must have for dedup. If we have proper TTL, the key size should be much smaller.
   
   After that if we still need disk based KV store we can introduce that as a plug-in. We don't want to introduce RocksDB dependency in default distribution


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org