You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ivan Bessonov (Jira)" <ji...@apache.org> on 2022/08/09 10:39:00 UTC

[jira] [Updated] (IGNITE-17076) Unify RowId format for different storages

     [ https://issues.apache.org/jira/browse/IGNITE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Bessonov updated IGNITE-17076:
-----------------------------------
    Description: 
Current MV store bridge API has a fatal flaw, born from a misunderstanding. There's a method called "insert" that generates RowId by itself. This is wrong, because it can lead to different id for the same row on the replica storage. This completely breaks everything.

Every replicated write command, that inserts new value, should produce same row ids. There are several ways to achieve this:
 * Use timestamps as identifiers. This is not very convenient, because we would have to attach partition id on top of it. It's mandatory to know the partition of the row.
 * Use more complicated structure, for example a tuple of (raftCommitIndex, partitionId, batchCounter), where

 * 
 ** raftCommitIndex is the index of write command that performs insertion.
 ** partitionId is an integer identifier of the partition. Could be 4 bytes, considering that there are plans to support more than 65000 partitions per table.
 ** batchCounter is used to differentiate insertions made in a single write command. We can limit it with 2 bytes to save a little bit of space, if it's necessary.

I prefer the second option, but maybe it could be revised during the implementation.

Of course, method "insert" should be removed from bridge API. Tests have to be updated. With the lack of RAFT group in storage tests, we can generate row ids artificially, it's not a big deal.

EDIT: second option makes it difficult to use row ids in action request processor in cases when data is inserted. So, hybrid clock + partition id is a better option.

EDIT 2: removing "insert" method from the API is out of scope for now.

  was:
Current MV store bridge API has a fatal flaw, born from a misunderstanding. There's a method called "insert" that generates RowId by itself. This is wrong, because it can lead to different id for the same row on the replica storage. This completely breaks everything.

Every replicated write command, that inserts new value, should produce same row ids. There are several ways to achieve this:
 * Use timestamps as identifiers. This is not very convenient, because we would have to attach partition id on top of it. It's mandatory to know the partition of the row.
 * Use more complicated structure, for example a tuple of (raftCommitIndex, partitionId, batchCounter), where

 * 
 ** raftCommitIndex is the index of write command that performs insertion.
 ** partitionId is an integer identifier of the partition. Could be 4 bytes, considering that there are plans to support more than 65000 partitions per table.
 ** batchCounter is used to differentiate insertions made in a single write command. We can limit it with 2 bytes to save a little bit of space, if it's necessary.

I prefer the second option, but maybe it could be revised during the implementation.

Of course, method "insert" should be removed from bridge API. Tests have to be updated. With the lack of RAFT group in storage tests, we can generate row ids artificially, it's not a big deal.

EDIT: second option makes it difficult to use row ids in action request processor in cases when data is inserted. So, hybrid clock + partition id is a better option.


> Unify RowId format for different storages
> -----------------------------------------
>
>                 Key: IGNITE-17076
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17076
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>
> Current MV store bridge API has a fatal flaw, born from a misunderstanding. There's a method called "insert" that generates RowId by itself. This is wrong, because it can lead to different id for the same row on the replica storage. This completely breaks everything.
> Every replicated write command, that inserts new value, should produce same row ids. There are several ways to achieve this:
>  * Use timestamps as identifiers. This is not very convenient, because we would have to attach partition id on top of it. It's mandatory to know the partition of the row.
>  * Use more complicated structure, for example a tuple of (raftCommitIndex, partitionId, batchCounter), where
>  * 
>  ** raftCommitIndex is the index of write command that performs insertion.
>  ** partitionId is an integer identifier of the partition. Could be 4 bytes, considering that there are plans to support more than 65000 partitions per table.
>  ** batchCounter is used to differentiate insertions made in a single write command. We can limit it with 2 bytes to save a little bit of space, if it's necessary.
> I prefer the second option, but maybe it could be revised during the implementation.
> Of course, method "insert" should be removed from bridge API. Tests have to be updated. With the lack of RAFT group in storage tests, we can generate row ids artificially, it's not a big deal.
> EDIT: second option makes it difficult to use row ids in action request processor in cases when data is inserted. So, hybrid clock + partition id is a better option.
> EDIT 2: removing "insert" method from the API is out of scope for now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)