You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Pavel Kovalenko (JIRA)" <ji...@apache.org> on 2018/04/03 09:30:00 UTC

[jira] [Updated] (IGNITE-8122) Partition state restored from WAL may be lost if no checkpoints are done

     [ https://issues.apache.org/jira/browse/IGNITE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Kovalenko updated IGNITE-8122:
------------------------------------
    Description: 
Problem:
1) Start several nodes with enabled persistence.
2) Make sure that all partitions for 'ignite-sys-cache' have status OWN on all nodes and appropriate PartitionMetaStateRecord record is logged to WAL
3) Stop all nodes and start again, activate cluster. Checkpoint for 'ignite-sys-cache' is empty, because there were no data in cache.
4) State for all partitions will be restored to OWN (GridCacheDatabaseSharedManager#restoreState) from WAL, but not recorded to page memory, because there were no checkpoints and data in cache. Store manager doesn't have any allocated pages (including meta) for such partitions.
5) On exchange done we're trying to restore states of partitions (initPartitionsWhenAffinityReady) on all nodes. Because page memory is empty, states of all partitions will be restored to MOVING by default.
6) All nodes start to rebalance partitions from each other and this process become unpredictable because we're trying to rebalance from MOVING partitions.

  was:
Problem:
1) Start several nodes with enabled persistence.
2) Make sure that all partitions for 'ignite-sys-cache' have status OWN on all nodes and appropriate PartitionMetaStateRecord record is logged to WAL
3) Stop all nodes and start again, activate cluster. Checkpoint for 'ignite-sys-cache' is empty, because there were no data in cache.
4) State for all partitions will be restored to OWN (GridCacheDatabaseSharedManager#restoreState) from WAL, but not recorded to page memory, because there were no checkpoints and data in cache. Store manager is not properly initialized for such partitions.
5) On exchange done we're trying to restore states of partitions (initPartitionsWhenAffinityReady) on all nodes. Because page memory is empty, states of all partitions will be restored to MOVING by default.
6) All nodes start to rebalance partitions from each other and this process become unpredictable because we're trying to rebalance from MOVING partitions.


> Partition state restored from WAL may be lost if no checkpoints are done
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-8122
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8122
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.4
>            Reporter: Pavel Kovalenko
>            Assignee: Pavel Kovalenko
>            Priority: Minor
>             Fix For: 2.5
>
>
> Problem:
> 1) Start several nodes with enabled persistence.
> 2) Make sure that all partitions for 'ignite-sys-cache' have status OWN on all nodes and appropriate PartitionMetaStateRecord record is logged to WAL
> 3) Stop all nodes and start again, activate cluster. Checkpoint for 'ignite-sys-cache' is empty, because there were no data in cache.
> 4) State for all partitions will be restored to OWN (GridCacheDatabaseSharedManager#restoreState) from WAL, but not recorded to page memory, because there were no checkpoints and data in cache. Store manager doesn't have any allocated pages (including meta) for such partitions.
> 5) On exchange done we're trying to restore states of partitions (initPartitionsWhenAffinityReady) on all nodes. Because page memory is empty, states of all partitions will be restored to MOVING by default.
> 6) All nodes start to rebalance partitions from each other and this process become unpredictable because we're trying to rebalance from MOVING partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)