You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alex Plehanov (Jira)" <ji...@apache.org> on 2023/05/03 16:26:00 UTC
[jira] [Updated] (IGNITE-11252) Docs: Index corruption recovery procedure

     [ https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Plehanov updated IGNITE-11252:
-----------------------------------
    Fix Version/s: 2.16
                       (was: 2.15)

> Docs: Index corruption recovery procedure
> -----------------------------------------
>
>                 Key: IGNITE-11252
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11252
>             Project: Ignite
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 2.7
>            Reporter: Denis A. Magda
>            Assignee: Prachi Garg
>            Priority: Critical
>             Fix For: 2.16
>
>
> We need to document a recovery procedure if an index corruption happens. Refer to this thread for details and examples of the exception dumped to the logs if the issue occurs:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data (partition files and WAL) is fine. Show code snippets of possible examples. Find via the references shared in the dev list discussion.
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have indexes. 
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in the Ignite log
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or WAL) was corrupted
> and there is no other way to recover it, but there are healthy copies of all data on other nodes.
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working directory, storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the node
> - If it isn’t, lookup the generated consistentId using control.sh and set it explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches



--
This message was sent by Atlassian Jira
(v8.20.10#820010)