You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Denis Chudov (Jira)" <ji...@apache.org> on 2023/03/29 13:36:00 UTC
[jira] [Commented] (IGNITE-19043) ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart

    [ https://issues.apache.org/jira/browse/IGNITE-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706389#comment-17706389 ] 

Denis Chudov commented on IGNITE-19043:
---------------------------------------

[~alapin]  LGTM

> ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart
> ---------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-19043
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19043
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> After enabling ItRaftCommandLeftInLogUntilRestartTest failed with
> {code:java}
> org.opentest4j.AssertionFailedError: expected: not <null> {code}
> while trying to retrieve previously added data after cluster restart. Seems that it's because there's no corresponding data in PK index.
> It is worth to mention that originally given test is about about raft log re-application on node restart. So, I've commented all  partitionUpdateInhibitor in order to check whether it's related to re-application or indexes themselves, problem is reproducible without re-application logic.
> It might be related to rocks to page memory defaults migration. Further investigation required.
> h3. Implementation notes
> After the investigation it's occurred that the reason of the failure is that raft log re-appliance is skipped within PartitionListener#handleUpdateCommand and PartitionListener#handleUpdateAllCommand because of following logic
> {code:java}
>         TxMeta txMeta = txStateStorage.get(cmd.txId());
>         if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == ABORTED)) {
>             storage.runConsistently(() -> {
>                 storage.lastApplied(commandIndex, commandTerm);
>                 return null;
>             });
>         } 
>  
> {code}
> Full scenario is following:
> 1. tx1.put populates raft log and mvPartitionStorage with corresponding log record and data.
> 2. tx1.commit also populates raft log with raft record and finished the transaction within txnStateStorage along wiht cleanup in mvPartitionStorage.
> 3. RocksDB based txnStateStorage flushes its state to a disk and page memory based doesn't.
> 4. After node restart raft replays the log, both put and commit commands, however on commit partition we skip put re-application  because of aforementioned
> {code:java}
> if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == ABORTED)){code}
> Just in case, transaction is considered to be committed because txnStateStorage flushes its state before stop.
>  
> So, in order to fix given issue it's enough to just remove the skip logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)