You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2023/05/02 09:31:00 UTC
[jira] [Updated] (IGNITE-19395) Reduce write amplification for RocksDB partition storage

     [ https://issues.apache.org/jira/browse/IGNITE-19395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Bessonov updated IGNITE-19395:
-----------------------------------
    Description: 
Currently, the "commit" operation in rocksdb storage looks like this:
{code:java}
val data = db.read(writeIntentKey);
db.remove(writeIntentKey);
db.write(committedKey, data);{code}
This is wasteful, we end up writing everything twice. There's another solution, we may add a level of indirection to the data:
{code:java}
// RowId index.
[ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
[ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId | CommitPartId ]

// Data.
[ DataId ] -> [ Payload ]{code}
{{DataId}} must be unique. I don't like the idea of auto-incrementing key (we should always persist latest value), there must be another way.

The main idea is that DataId doesn't change while committing the data, meaning that it can be generated using RowId and TxId.

For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with mandatory partition ID prefix and probably a table ID prefix)

  was:
Currently, the "commit" operation in rocksdb storage looks like this:
{code:java}
val data = db.read(writeIntentKey);
db.remove(writeIntentKey);
db.write(committedKey, data);{code}
This is wasteful, we end up writing everything twice. There's another solution, we may add a level of indirection to the data:
{code:java}
// RowId index.
[ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
[ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId | CommitPartId ]

// Data.
[ DataId ] -> [ Payload ]{code}
{{DataId}} must be unique. I don't like the idea of auto-incrementing key we should always persist latest value), there must be another way.

The main idea is that DataId doesn't change while committing the data, meaning that it can be generated using RowId and TxId.

For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with mandatory partition ID prefix and probably a table ID prefix)


> Reduce write amplification for RocksDB partition storage
> --------------------------------------------------------
>
>                 Key: IGNITE-19395
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19395
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> Currently, the "commit" operation in rocksdb storage looks like this:
> {code:java}
> val data = db.read(writeIntentKey);
> db.remove(writeIntentKey);
> db.write(committedKey, data);{code}
> This is wasteful, we end up writing everything twice. There's another solution, we may add a level of indirection to the data:
> {code:java}
> // RowId index.
> [ TableId?? | PartId | RowId | Timestamp ] -> [ DataId ]
> [ TableId?? | PartId | RowId ] -> [ DataId | TxId | CommitTableId | CommitPartId ]
> // Data.
> [ DataId ] -> [ Payload ]{code}
> {{DataId}} must be unique. I don't like the idea of auto-incrementing key (we should always persist latest value), there must be another way.
> The main idea is that DataId doesn't change while committing the data, meaning that it can be generated using RowId and TxId.
> For example, {{RowId ++ beginTimestamp(TxId)}} seems like a unique value (with mandatory partition ID prefix and probably a table ID prefix)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)