You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by yi xiuxiu <yi...@gmail.com> on 2022/05/31 07:31:56 UTC

[Discuss][DSIP] Support single replica compaction

Hi all,


I’d like to propose the single replica compaction in doris。


Since same operations are performed on all replicas in compaction, which
leads to a lot of resource comsumption。I plan
to support single replica compaction to reduce the usage of CPU and IO in
cluster, do compaction in one replica and  Other
replicas only need to copy data from it.

I will introduce some random factors so that different replica of a tablet
will do compaction at different times, once a replica begin
compaction, other related replicas just wait and then copy result file。


--
Best Regards.


By yixiutt

Re:[Discuss][DSIP] Support single replica compaction

Posted by weizuo <we...@apache.org>.
Hi, Thanks for your proposal.
 Compaction for single replica may be a good idea to reduce the usage of CPU and Memory. 
I have also considered this plan before and discussed with Mingyu. During design and implementation, there are some 
key factors we need to consider:


1. The amount of data merged by base compaction will be relatively large, and cumulative compaction will be executed frequently.
    therefore there may be continuous large amount of data transfer between BE nodes. Will network bandwidth become a bottleneck?
2. Query performance depends on the progress of the version merge and segment file synchronization between replicas may not 
     be real-time, so that replica selection for query needs to be considered when generating query execution plan.
3. Whether load balancing between replicas for a tablet needs to be designed.


Looking forward to your detail design.




Zuo Wei

Email: weizuo@apache.org





At 2022-05-31 15:31:56, "yi xiuxiu" <yi...@gmail.com> wrote:
>Hi all,
>
>
>I’d like to propose the single replica compaction in doris。
>
>
>Since same operations are performed on all replicas in compaction, which
>leads to a lot of resource comsumption。I plan
>to support single replica compaction to reduce the usage of CPU and IO in
>cluster, do compaction in one replica and  Other
>replicas only need to copy data from it.
>
>I will introduce some random factors so that different replica of a tablet
>will do compaction at different times, once a replica begin
>compaction, other related replicas just wait and then copy result file。
>
>
>--
>Best Regards.
>
>
>By yixiutt