You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/11/10 01:23:54 UTC

[GitHub] [incubator-pinot] snleee edited a comment on issue #6189: Support Timestamp Pruning of segments in Broker

snleee edited a comment on issue #6189:
URL: https://github.com/apache/incubator-pinot/issues/6189#issuecomment-724387596


   @noahprince22 If we keep the start timestamp only, we cannot effectively prune segments because we don't know the upper bound. Keeping start time may help for your use case but it's not a generic solution. (i.e. time filter on queries can be made with no end timestamp)
   
   Simple math:
   
   Let's assume that we roughly store 100bytes for each segment ( we need to store segment name, start & end timestamps, and some other info).
   ```
   100 bytes /segment * 20 million segments =  ~2GB
   ```
   
   It indeed requires GBs of memory; however, having 20millions of segments for Pinot cluster is a bit extreme use cases. If you set your segment size to be a reasonable size (200-300MB per segment), you won't have 20million segments. (200MB * 20 million segments = 4PB). To support this many segments, we probably need to read the metadata from disk instead of keeping everything in memory.
   
   IMO, we can first start with what @jtao15 suggested and see how they perform on your use case. How do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org