You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2022/08/23 14:04:00 UTC

[jira] [Created] (IGNITE-17571) Implement GC for MV storages

Ivan Bessonov created IGNITE-17571:
--------------------------------------

Summary: Implement GC for MV storages
Key: IGNITE-17571
URL: https://issues.apache.org/jira/browse/IGNITE-17571
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov

h3. Basics

Current implementation can only work with infinite storage space. This is because the storage works in appen-only mode (except for transactions rollbacks).

There's a value in the system called "low watermark". It's guaranteed, that no new RO transactions will be started at the time earlier then LW. Existence of such value allows us to safely delete all versions before it. We will not discuss the mechanics of acquiring such value, assuming that it is simply given.
h3. API

Original intended design looks like following:

{code:java}
cleanupCompletedVersions(@Nullable byte[] lower, boolean fromInclusive, @Nullable byte[] upper, boolean toInclusive, Timestamp timestamp){code}
Now, I don't think that this is necessarily a good design. Main problem with it is the existence of bounds:
* First of all, why not just have inclusive lower bound and exclusive upper bound, like almost all methods with bounds in existence.
* Second, I believe that this API has been proposed in assumption that row ids will have a uniform distribution and every "range" cleanup will result in somewhat equal amount of job executed. This is simply not true in current implementation.
RowId is a timestamp based value that exist in a very narrow time slice, making most of ranges empty and meaningless.
Then, the way they're stored is actually up to the storage. There's no restrictions on byte order when physically storing RowId objects.

Given that "cleanup" is a background process, a simple update of low watermark value would be enough. Underlying machinery will do its job.
h3. Problems

There's one thing that worries me: indexes.

Because storages are unaware of table schemas, it's impossible to cleanup indexes. This gets me thinking. API should be flexible enough so that indexes cleanup can be performed as an external operation over the storage. This will also reduce the amount of job that we need to do for the implementation.

To be specific, it feels like the method would look like this:
{code:java}
RowId cleanup(Timestamp threshold, RowId startId, int numRows, BiConsumer<RowId, BinaryRow> indexCleaner);{code}
Explanation is required.
* timestamp represents the same thing as before - low watermark.
* startId - the row that should be first to iterate in current batch.
* numRows - number of rows that should be cleaned in current batch. By this I mean individual versions, not chains.
* cleaner closure must be used to clean indexes after every individual version removal. Right now it doesn't look optimal to me, but I struggle to find a good solution on efficient indexes cleanup.
* next rowId is returned, that should be used a startId in next call. "null" if cleanup is complete. In this case it can be started from the beginning or simply postponed until new low watermark value is available.

How to operate it:
* numRows has two strategic purposes:
** to control the size of batches in "runConsistently" closures.
** to be able to run many cleanups in parallel avoiding pools starvation. Every task is split into small chunks.
* cleanup should start from the smallest possible row id. Unfortunately, we can't just use (0L, 0L), that's wrong. Maybe we should add something like "smallestRowId()" to storage engine.
* low watermark value can be changed in-between calls. This is fine and even preferable.

h3. Random notes

Passed closure should only be executed when specific version is already deleted. Otherwise the data in row store will match indexes and indexes would not be cleaned, which is wrong. There must be a test for it, of course.

Unsurprisingly, integration of this method into a raft machine is out of scope. I don't think that LW propagation is implemented at this point. I may be wrong. Anyways, we better discuss it with guys who implement transactions right now.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)