You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/13 18:49:43 UTC

[GitHub] [incubator-pinot] pedro93 opened a new issue #6912: Segment compaction for upsert real-time tables

pedro93 opened a new issue #6912:
URL: https://github.com/apache/incubator-pinot/issues/6912


   Hello,
   
   This issue serves to request support for segment compaction on real-time upsert-enabled tables which currently does not exist as mentioned in a [slack thread](https://apache-pinot.slack.com/archives/CDRCA57FC/p1620826182368300). This means that segments with old & stale entries are keep in disk and only deleted when the retention policy for segments is activated.
   
   Giving a concrete example why this is useful:
    - Suppose you have have a stream of events related to user activity (updated profile, saw an article, updated preferences, etc...) 
    - Defined a real-time table in pinot  where the primary key is the userId. Segment size is 500k and the stream is partitioned.
    - The set of users is roughly fixed (~50M).
    - You want to keep segments for a largeish time period (> 2 years).
    - Each day ~20% (10M) of the users generate some event which is consumed by Pinot.
   
   This will generate ~20 segments per day, over the course of 2 years we will have 14600 segments when in reality we need only 100 segments (the most up-to-date information for each user).
   
   If the example or issue is not clear feel free to reach out.
   
   Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org