You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Ivan Rakov (JIRA)" <ji...@apache.org> on 2018/05/18 10:46:00 UTC

[jira] [Created] (IGNITE-8529) Implement testing framework for checking delta records consistency

Ivan Rakov created IGNITE-8529:
----------------------------------

             Summary: Implement testing framework for checking delta records consistency
                 Key: IGNITE-8529
                 URL: https://issues.apache.org/jira/browse/IGNITE-8529
             Project: Ignite
          Issue Type: New Feature
          Components: persistence
            Reporter: Ivan Rakov


We use sharp checkpointing of page memory in persistent mode. That implies that we write two types of record to write-ahead log: logical (e.g. data records) and phyisical (page snapshots + binary delta records). Physical records are applied only when node crashes/stops during ongoing checkpoint. We have the following invariant: checkpoint #(n-1) + all physical records = checkpoint #n.
If correctness of physical records is broken, Ignite node may recover with incorrect page memory state, which in turn can bring unexpected delayed errors. However, consistency of physical records is poorly tested: only small part of our autotests perform node restarts, and even less part of them performs node stop when ongoing checkpoint is running.
We should implement abstract test that:
1. Enforces checkpoint, freezes memory state at the moment of checkpoint.
2. Performs necessary test load.
3. Enforces checkpoint again, replays WAL and checks that page store at the moment of previous checkpoint with all applied physical records exactly equals to current checkpoint state.
Except for checking correctness, test framework should do the following:
1. Gather statistics (like histogram) for types of wriiten physical records. That will help us to know what types of physical records are covered by test.
2. Visualize expected and actual page state (with all applied physical records) if incorrect page state is detected.
Regarding implementation, I suppose we can use checkpoint listener mechanism to freeze page memory state at the moment of checkpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)