You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/12/01 13:38:44 UTC

[GitHub] [incubator-doris] weizuo93 opened a new issue #4997: [Proposal] Take 'deleted rows' into consideration when selecting a tablet for compaction task

weizuo93 opened a new issue #4997:
URL: https://github.com/apache/incubator-doris/issues/4997


   The rows deleted by `delete operation` will not be deleted from the disk untill base compaction for the relevant tablet is performed. The data deleted logically not only occupies disk space, but also has an effects on scan performance. So it is necessary to perform compaction task for the tablet that contains a lot of deleted rows. 
   
   Can we take 'rows_del_filtered' into consideration when selecting a tablet for compaction task?
   
   For a tablet, we can record the filtered rows during scan operation since last base compaction, and take the filtered rows as a consideration factor when selecting a tablet for compaction task. `tablet score` for compaction can be calculated like this:
   
     `tablet_score = k1 * tablet_scan_frequency + k2 * old_compaction_score  + k3 * rows_del_filtered`
   
   `k1`,`k2`and `k3`can be set dynamically through http interface `/api/update_config`.
   
   Of course, the impact on scan performance is different between rows in `DEL_PARTIAL_SATISFIED`blocks and those in `DEL_SATISFIED` blocks , and can be treated separately.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org