You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/06 14:47:00 UTC

[jira] [Commented] (IGNITE-8167) Recovery after crash sometimes leads to starting from beginning absolute wal segment index

    [ https://issues.apache.org/jira/browse/IGNITE-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428409#comment-16428409 ] 

ASF GitHub Bot commented on IGNITE-8167:
----------------------------------------

GitHub user amelius0712 opened a pull request:

    https://github.com/apache/ignite/pull/3771

    IGNITE-8167: Fix inconsistent last record pointer in case of recovery from corrupted WAL

    Let's look at this peace of code from GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory
    
    `
                WALPointer restore = restoreMemory(status);
    
                // First, bring memory to the last consistent checkpoint state if needed.
                // This method should return a pointer to the last valid record in the WAL.
    
                cctx.wal().resumeLogging(restore);
    `
    In case of `restore == null`. Logging will be resuming from 0 absolute WAL index.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Synesis-LLC/ignite ignite-8167

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/3771.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3771
    
----
commit 40b9c8e227783f8d90fff5f2db4688e63be3dd37
Author: Pavel Sapezhko <pa...@...>
Date:   2018-04-06T14:36:23Z

    IGNITE-8167: Fix inconsistent last record pointer in case of recovery from corrupted WAL

----


> Recovery after crash sometimes leads to starting from beginning absolute wal segment index
> ------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-8167
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8167
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.4
>         Environment: Doesn't meter. We saw these behavior in k8s deployment as in local deployment too. Using any of WAL_MOD.
>            Reporter: Pavel Sapezhko
>            Priority: Major
>             Fix For: 2.5
>
>
> When we are trying to restore after crash using wal log, sometimes we can find corrupted wal messages which can leads to starting from beginning absolute wal index. So, we will have broken wal archiver thread due to assertation error(but we still having working Ignite instance. I think we need to discuss if we are really want it) and as a result on next restart we can see "Wal history is too short" message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)