You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2023/05/02 08:04:00 UTC
[jira] [Created] (IGNITE-19395) Reduce write amplification for RocksDB partition storage
Ivan Bessonov created IGNITE-19395:
--------------------------------------
Summary: Reduce write amplification for RocksDB partition storage
Key: IGNITE-19395
URL: https://issues.apache.org/jira/browse/IGNITE-19395
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov
Currently, the "commit" operation in rocksdb storage looks like this:
{code:java}
val data = db.read(writeIntentKey);
db.remove(writeIntentKey);
db.write(committedKey, data);{code}
This is wasteful, we end up writing everything twice. There's another solution, we may add a level of indirection to the data:
{code:java}
// RowId index.
[ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
[ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId | CommitPartId ]
// Data.
[ DataId ] -> [ Payload ]{code}
{{DataId}} must be unique. I don't like the idea of auto-incrementing key we should always persist latest value), there must be another way.
The main idea is that DataId doesn't change while committing the data, meaning that it can be generated using RowId and TxId.
For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with mandatory partition ID prefix and probably a table ID prefix)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)