You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Goncharuk (Jira)" <ji...@apache.org> on 2019/10/07 08:30:00 UTC
[jira] [Created] (IGNITE-12263) Introduce native persistence compaction operation

Alexey Goncharuk created IGNITE-12263:
-----------------------------------------

             Summary: Introduce native persistence compaction operation
                 Key: IGNITE-12263
                 URL: https://issues.apache.org/jira/browse/IGNITE-12263
             Project: Ignite
          Issue Type: Improvement
            Reporter: Alexey Goncharuk


Currently, Ignite native persistence does not shrink storage files after key-value pairs are removed.
The causes of this behavior are:
 * The absence of a mechanism that allows Ignite to track highest non-empty page position in a partition file
 * The absence of a mechanism which allows Ignite to select a page closest to the file beginning for write
 * The absence of a mechanism which allows Ignite to move a key-value pair from page to page during defragmentation

As an initial change I suggest to introduce a new node startup mode, which will run a defragmentation procedure allowing the node to shrink storage files. The procedure will not mutate the logical state of a partition allowing further historical rebalance to quickly catch up the node. Since the procedure will run during the node startup (during the final stages of recovery), there will be no concurrent load, thus the entries can be freely moved from page to page with no tricky synchronization.

If a procedure is applied during the whole cluster restart, then all nodes will be defragmented simultaneously, allowing for a quicker parallel defragmentation at a cost of downtime.

The procedure should accept an optional list of cache groups to defragment to allow arbitrary cache group selection for defragmentation.

An idea of the actions taken during the run for each partition selected for defragmentation:
 * Partition pages are preloaded to memory if possible to avoid excessive page replacement. During the scan, a HWM of the written data is detected (empty pages are skipped)
 * Pages references in a free list are sorted in a way allowing to pick pages closest to the file start
 * The partition is scanned in reverse order, key-value pairs are moved closer to the file start, HWM is updated accordingly. This step is particularly open for various optimizations because different strategies will work well for different fragmentation patterns.
 * After the scan iteration is completed, the file size can be updated according to the HWM

As a further improvement, this partition defragmentation procedure can be later run in online mode, after proper cache update protocol changes are designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)