You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Denis Chudov (Jira)" <ji...@apache.org> on 2023/03/29 13:36:00 UTC
[jira] [Commented] (IGNITE-19043) ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart
[ https://issues.apache.org/jira/browse/IGNITE-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706389#comment-17706389 ]
Denis Chudov commented on IGNITE-19043:
---------------------------------------
[~alapin] LGTM
> ItRaftCommandLeftInLogUntilRestartTest: PageMemoryHashIndexStorage lacks data after cluster restart
> ---------------------------------------------------------------------------------------------------
>
> Key: IGNITE-19043
> URL: https://issues.apache.org/jira/browse/IGNITE-19043
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexander Lapin
> Assignee: Alexander Lapin
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> After enabling ItRaftCommandLeftInLogUntilRestartTest failed with
> {code:java}
> org.opentest4j.AssertionFailedError: expected: not <null> {code}
> while trying to retrieve previously added data after cluster restart. Seems that it's because there's no corresponding data in PK index.
> It is worth to mention that originally given test is about about raft log re-application on node restart. So, I've commented all partitionUpdateInhibitor in order to check whether it's related to re-application or indexes themselves, problem is reproducible without re-application logic.
> It might be related to rocks to page memory defaults migration. Further investigation required.
> h3. Implementation notes
> After the investigation it's occurred that the reason of the failure is that raft log re-appliance is skipped within PartitionListener#handleUpdateCommand and PartitionListener#handleUpdateAllCommand because of following logic
> {code:java}
> TxMeta txMeta = txStateStorage.get(cmd.txId());
> if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == ABORTED)) {
> storage.runConsistently(() -> {
> storage.lastApplied(commandIndex, commandTerm);
> return null;
> });
> }
>
> {code}
> Full scenario is following:
> 1. tx1.put populates raft log and mvPartitionStorage with corresponding log record and data.
> 2. tx1.commit also populates raft log with raft record and finished the transaction within txnStateStorage along wiht cleanup in mvPartitionStorage.
> 3. RocksDB based txnStateStorage flushes its state to a disk and page memory based doesn't.
> 4. After node restart raft replays the log, both put and commit commands, however on commit partition we skip put re-application because of aforementioned
> {code:java}
> if (txMeta != null && (txMeta.txState() == COMMITED || txMeta.txState() == ABORTED)){code}
> Just in case, transaction is considered to be committed because txnStateStorage flushes its state before stop.
>
> So, in order to fix given issue it's enough to just remove the skip logic.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)