You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2019/10/23 12:49:15 UTC

[GitHub] [incubator-doris] gaodayue commented on issue #2016: [Proposal] Limit the memory usage of Compaction

gaodayue commented on issue #2016: [Proposal] Limit the memory usage of Compaction
URL: https://github.com/apache/incubator-doris/issues/2016#issuecomment-545427345

> what I want to do has no effect for current load process. It will be done before we add this rowset to StorageEngine. If we found there are too many number of rowsets, we can try to merge some of them to a bigger rowset. Actually we can do it for all load operation, because it will improve our read performance.

I think the motivation for compaction within a rowset is to reduce the number of overlapped segments and improve query performance. However when the number of segments is large, single round of compaction would consume too much memory. So we need to decide how many number of segments to compact at a time based on estimated RowBlock size and available memory.

My question is the previous compaction strategy will generate a rowset with a new version (like 0-6) to replace input rowsets with overlapped version, but when doing compaction within a rowset, you end up with two rowset with the same version. Is it a problem? And if it is, how do you plan to solve it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org