You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Sergey Chugunov (Jira)" <ji...@apache.org> on 2022/09/28 12:15:00 UTC

[jira] [Updated] (IGNITE-17084) Native rebalance for RocksDB partitions

     [ https://issues.apache.org/jira/browse/IGNITE-17084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Chugunov updated IGNITE-17084:
-------------------------------------
    Epic Link: IGNITE-17774  (was: IGNITE-16923)

> Native rebalance for RocksDB partitions
> ---------------------------------------
>
>                 Key: IGNITE-17084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17084
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> General idea of full rebalance is described in https://issues.apache.org/jira/browse/IGNITE-17083
> For persistent storages, there's an option to avoid copy-on-write rebalance algorithms if it's desired. Intuitively, it's a preferable option. Each storage chooses its own format.
> In this case, RocksDB allows consistent db iteration using a "Snapshot" feature. Idea is very simple:
>  * Take a RoackDB snapshot.
>  * Iterate through partition data.
>  * Iterate through indexes.
>  * Relese the snapshot.
> There must be a common "infrastructure" or a framework to stream native rebalance snapshots. Data format should be as simple as possible.
> NOTE: of course, it has to be mentioned that this approach might lead to ineffective storage space usage. What I mean is that "previous" versions of values, in terms of RocksDB, must be stored on the device if they're visible from any of snapshots. It can be a problem in theory, but in practice full rebalance isn't expected to occur often, and event then we don't expect that users will rewrite the entire partition data in a span of a single rebalance.
> h2. Possible problems
> Given that "raw" data is sent, including sql indexes, all incompleted indexes will be sent incompleted. Maybe we should also send a build state for each index so that the receiving side could continue from the right place, not from the beginning.
> This problem will be resolved in the future. Currently we don't have indexes implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)