You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "KeDeng (Code Review)" <ge...@cloudera.org> on 2023/04/25 06:04:19 UTC

[kudu-CR] [tablet] GC ancient, fully deleted rowsets without live row count stats [tablet] GC ancient rowsets that are fully deleted without live row count stats

Hello Tidy Bot, Alexey Serbin, Yuqi Du, Yingchun Lai, Kudu Jenkins, Abhishek Chennaka, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19670

to look at the new patch set (#17).

Change subject: [tablet] GC ancient, fully deleted rowsets without live row count stats [tablet] GC ancient rowsets that are fully deleted without live row count stats
......................................................................

[tablet] GC ancient, fully deleted rowsets without live row count stats
[tablet] GC ancient rowsets that are fully deleted without live row count stats

We added a background op to GC ancient, fully deleted rowsets for
KUDU-1625 base on live row count. That patch is very useful, but
does not work for older versions(earlier than 1.10) that do not
support live row count stats. And during the upgrade process from
a lower version to a higher version, live row count feature cannot
be enabled for already existing data.

To resolve this issue on a lower version of the kudu cluster, I submitted
this patch. The main reason is to replace the use of live row count.
However, due to the lack of a more accurate counting method, this patch
may only release part of the storage space for ancient, fully deleted rows.
Therefore, this feature can alleviate the storage space tension of older
versions to a certain extent.

If you need to enable this feature, enable the flag
--enable_gc_deleted_rowsets_without_live_row_count and restart tservers.

There's still room for improvement in this implementation in that, currently,
we ignored the delete operation in DMS. I will resolve this issue in a follow-up
patch.

I ran this on a real cluster, the storage space of deleted rowsets that was not
previously freed can be GCed as expected. And I also add unit test case to ensure
it make sense.

Change-Id: Iacdff107b8b07cbd56f47f296a93f4bcfbf56b41
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/diskrowset-test-base.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_history_gc-test.cc
8 files changed, 167 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/19670/17
-- 
To view, visit http://gerrit.cloudera.org:8080/19670
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iacdff107b8b07cbd56f47f296a93f4bcfbf56b41
Gerrit-Change-Number: 19670
Gerrit-PatchSet: 17
Gerrit-Owner: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>