You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Kirill Tkalenko (Jira)" <ji...@apache.org> on 2023/02/24 09:59:00 UTC

[jira] [Resolved] (IGNITE-18113) Implement a rows queue for GC in MV storages

     [ https://issues.apache.org/jira/browse/IGNITE-18113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirill Tkalenko resolved IGNITE-18113.
--------------------------------------
    Resolution: Won't Do

No need to implement, each store contains its own garbage collection queue.

> Implement a rows queue for GC in MV storages
> --------------------------------------------
>
>                 Key: IGNITE-18113
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18113
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>
> Please refer to the Epic for the prerequisites. Please also refer to IGNITE-18031 and IGNITE-18033 for some thoughts on the topic.
> IGNITE-18031 is proposed for the implementation in the assumption that it's easy to implement and then run tests without fear of disk overflow.
> But, this may not be the case and there's a chance that it should actually be closed in favor of the current issue. In depends on how much time we're willing to spend on the implementation right now and how much simpler the implementation of the naive approach actually is. We'll evaluate it later.
> EDIT: that idea is abandoned, queue is the right way.
> h3. Algorithm idea description
> First of all, why the naive approach is bad. The reason is simple - it requires constant reads from the device. This leads to both unnecessary disk operations and unnecessary page buffer cache invalidation, independent from the engine. All engines have a limited amount of memory to work with. Generally speaking, it's significantly less that the amount of data on disk.
> Another approach is to have a queue. As already stated, it's very similar to the way TTL is implemented in Ignite 2.x. Every time there's a commit of the row, we add a {{(commitTs, rowId)}} pair into the queue. Every time we want to evict something, we just iterate all pairs until a certain timestamp and delete these entries from both a queue and the row storage.
>  * Pros: faster deletion, better utilization of the resources
>  * Cons: slower insertions
> Now, why aren't slow insertions a bad thing? For me personally this is a trickн question to answer. The way I see it - writes are inherently slow, there's not much we can do about it.
> Meta information must be updated, indexes must be updated, current outgoing rebalance shapshots must be taken into account. There's even a chance of write intent resolution taking place, which is weird but we allow it. It's not necessarily instantaneous.
> Having a single additional insert, which is almost always an append (which may in theory decrease its run time, but that's kinda hard to measure), won't harm. Especially considering that we're going to perform partial query cleanup in the same thread later (but after completing raft done closure, I guess, we shouldn't block the leader from returning the result to the client). 
> h3. Problems & solutions
> In order for the queue to be queue, we should always delete the head when we processed it. What's the point otherwise.
> If we implement the GC without proper thinking we may have problems. For example, imagine having a row with a single version that's very old and nobody changes it, ever. GC shouldn't delete this version of the row, instead it would re-insert it back to the queue with slightly larger timestamp. This is bad. Such behavior guarantees constant useless GC work on idle cluster, that's not what we want.
> The way to avoid this problem _is to never store the timestamp of the oldest version_ in a queue. This way the algorithm may become the following:
> {code:java}
> Pair<Timestamp, RowId> pair = queue.poll();
> if (pair.getTimestamp() > lowWatermark) {
>   return; // queue is exhausted, idle
> }
> // Just do it. Delete everything below the threshold
> storage.deleteVersionsBelowTimestamp(pair.getRowId(), pair.getTimestamp());{code}
> There's an interesting situation with tombstones. This one's tricky. We have two options:
>  * GC deletes tombstones when they become the oldest known version in the chain
>  * CG never deletes tombstones
> If we go the first route, we have to remember that it also _creates_ a new oldest version that _must_ be deleted from the queue to preserve structure data consistency.
> If we go the second route, we may end up with stale tombstones that will never be deleted.
> First option is better, because it actually works, but it's harder to implement.
> EDIT: poll() should also delete data at the same time, or return null if timestamp restriction isn't met.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)