You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/24 07:46:48 UTC

[GitHub] [druid] FrankChen021 commented on issue #9755: why doesn't Druid delete small segments after compact task complete?

FrankChen021 commented on issue #9755:
URL: https://github.com/apache/druid/issues/9755#issuecomment-618859501


   Hi @jihoonson  With those two parameters you suggest, the unused segments after compaction could be DELETED both from meta storage and deep storage. but before the coordinator issues the delete task, these unused segments might still cause the slow query problem and take more storage space.
   
   for example, there's a retention rule for default tier for 3 months and a drop rule to drop forever beyond 3 month. Because i want keep the data in deep storage for another 3 month, and i don't want coordinator deletes the unused segments marked by this drop rule, so `druid.coordinator.kill.durationToRetain` will be set to `PT6M`, which means all the unused segments will be kept for 6 months.
   
   in our case, there're more than 20GB data put into druid, and more than 1200 segments will be generated every day(all the segment granularity is 'PT1H').  And everyday we compact these segments together, which results in  about another 400 segments. That means in total, there will be (1200 + 400) * 30 * 6 = 288,000 segments in the 6 months period.
   
   this number might be acceptable now. but if we want to keep the data in deep storage for longer time, or the data stored in druid grows rapidly day by day, unused segments waiting for deletion after compaction will be more and more.
   
   As there's a way to allow coordinator kills unused segment, i think it's reasonable to provide a way for user to delete unused segments after compaction. considering the scenario you mentioned that user might want to revert data, if it's plausible to provide parameters to do the auto deletion ? For example, a parameter named as `enabled` controls whether we enable the auto deletion function, another parameter named as `durationRetention` controls how long the unused segments will be kept after compaction if `enabled` is enabled.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org