You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Ivan Pavlukhin (JIRA)" <ji...@apache.org> on 2019/01/31 08:49:00 UTC

[jira] [Updated] (IGNITE-10219) MVCC: TX: Backup node update may fail after lost tx rollback.

     [ https://issues.apache.org/jira/browse/IGNITE-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Pavlukhin updated IGNITE-10219:
------------------------------------
    Summary: MVCC: TX: Backup node update may fail after lost tx rollback.  (was: MVCC: TX: Backup node update may fails after lost tx rollback.)

> MVCC: TX: Backup node update may fail after lost tx rollback.
> -------------------------------------------------------------
>
>                 Key: IGNITE-10219
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10219
>             Project: Ignite
>          Issue Type: Bug
>          Components: mvcc
>            Reporter: Andrew Mashenkov
>            Assignee: Andrew Mashenkov
>            Priority: Major
>              Labels: mvcc_stabilization_stage_1, transactions
>             Fix For: 2.8
>
>         Attachments: CacheMvccTxFailoverTest.java
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The use case:
>  # Start Tx and update entry.
>  # Kill backup.
>  # Rollback Tx (backup miss this tx state change due to outage)
>  # Start backup.
>  # Update same entry may fails with unknown tx state for latest entry version.
> Backup won't rebalance partition for the key as rollback doesn't increment partition counter and can't found active transaction for latest entry version as TxLog contains neither commit nor rollback record.
> Also Tx can't be detected as rolled back as mvcc coordinator version hasn't changed during backup node outage.
>  
> Possible solutions are
>  # Increment mvcc coordinator version on every node join event that need to be carefully tested.
>  # Scan cache to cleanup such entries on node startup (right after recovery from WAL) which is inefficient.
>  # Or may be log Tx start in TxLog and rollback all active Tx on node startup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)