You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Lapin (Jira)" <ji...@apache.org> on 2022/08/10 15:10:00 UTC

[jira] [Commented] (IGNITE-17258) Implement ReplicaListener

    [ https://issues.apache.org/jira/browse/IGNITE-17258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578031#comment-17578031 ] 

Alexander Lapin commented on IGNITE-17258:
------------------------------------------

[~v.pyatkov] LGTM to feature branch.

> Implement ReplicaListener
> -------------------------
>
>                 Key: IGNITE-17258
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17258
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3, transaction3_rw
>
> For general context please check IGNITE-17252. In order to specify request-specific handling logic that will map particular actionRequest to corresponding set of operations it's required to introduce such mapping rules in a similar way that is used within raft listeners, in other words it's required to introduce a sort of state machine for replica 
> As tx design document notes common flow for major tx requests is following:
> {code:java}
> On receiving OpRequest
> 1. Check primary replica lease. Return the failure if not valid.
> 2. Try to acquire a shared or exclusive lock, depending on the op type.
> 3. If failed to acquire the lock due to a conflict, return the failure.
> 4. When the lock is acquired, return an OpResponse to a coordinator with a value, if op type is read or read-write.The OpResponse structure is:
>     opCode:int // 0 - ok, !0 - error code
>     result:Array
>     readLeases:Map<partitionId, LeaseInterval>
>     timestamp:HLC
> 5. Replicate a write intent asynchronously if op type is write or read-write
>     As soon as the write intent is replicated, send WriteAckResponse to the coordinator.    The WriteAckReponse structure is:
>     opCode:int // 0 - ok, !0 - error code
>     opId:int
>     timestamp:HLC
> 6. Return the replication ack response to a coordinator. {code}
> Given steps should be managed from within ReplicaListener. Why? Because concrete set of locks to acquire depends on operation type:
> {code:java}
> The required locks on the row store are the following:
> 1. Tuple get(RowId rowId, UUID txId)
>     IS_commit(table) S_commit(rowId)
> 2.Tuple get(RowId rowId, @Nullable Timestamp timestamp)
>     No locks. Null timestamp is used to read the latest committed value for a single get.
> 3.Tuple getForUpdate(RowId rowId, UUID txId)
>     IX_commit(table) X_commit(rowId)
> 4. RowId insert(Tuple row, UUID txId)
>     IX_commit(table)
> 5. boolean update(RowId rowId, Tuple newRow, UUID txId)
>     IX_commit(table) X_commit(rowId)
> 6. Tuple remove(RowId rowId, UUID txId)
>     IX_commit(table) X_commit(rowId)
> 7. void commitWrite(RowId rowId, Timestamp timestamp, UUID txId)
> 8. void abortWrite(RowId rowId, UUID txId)
> 9. Iterator<Tuple> scan(Predicate<Tuple> filter, UUID txId)
>     S_commit(table) - if a predicate can produce phantom reads, IS_commit(table) - otherwise
> 10. Iterator<Tuple> scan(Predicate<Tuple> filter, Timestamp timestamp)
>     No locks
> 11. <T> Iterator<T> invoke(Predicate<Tuple> filter, InvokeClosure<T> clo, UUID txId)
>     SIX_commit(table) - if a predicate can produce phantom reads, IX_commit(table) otherwise X_commit on each updated row. {code}
> Please check ts design for full set of required actions for lock management, e.g. index-based locks.
> Besides that there are some actions, like commit/abort transaction (replicateTxnState) that have dedicated handling logic.
> *!* Given ticket should be validated with SE_team in order to check whether whey are fine with proposed index managing actors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)