You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2023/05/02 08:04:00 UTC

[jira] [Created] (IGNITE-19395) Reduce write amplification for RocksDB partition storage

Ivan Bessonov created IGNITE-19395:
--------------------------------------

             Summary: Reduce write amplification for RocksDB partition storage
                 Key: IGNITE-19395
                 URL: https://issues.apache.org/jira/browse/IGNITE-19395
             Project: Ignite
          Issue Type: Improvement
            Reporter: Ivan Bessonov


Currently, the "commit" operation in rocksdb storage looks like this:
{code:java}
val data = db.read(writeIntentKey);
db.remove(writeIntentKey);
db.write(committedKey, data);{code}
This is wasteful, we end up writing everything twice. There's another solution, we may add a level of indirection to the data:
{code:java}
// RowId index.
[ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
[ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId | CommitPartId ]

// Data.
[ DataId ] -> [ Payload ]{code}
{{DataId}} must be unique. I don't like the idea of auto-incrementing key we should always persist latest value), there must be another way.

The main idea is that DataId doesn't change while committing the data, meaning that it can be generated using RowId and TxId.

For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with mandatory partition ID prefix and probably a table ID prefix)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)