You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Chugunov (Jira)" <ji...@apache.org> on 2023/03/07 12:46:00 UTC

[jira] [Commented] (IGNITE-18715) B+Tree corruption error caused Ignite cluster crash and not able restart

    [ https://issues.apache.org/jira/browse/IGNITE-18715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697408#comment-17697408 ] 

Sergey Chugunov commented on IGNITE-18715:
------------------------------------------

[~nifeng2xing] , do you have a reproducer for this problem? From the stack trace provided in the ticket it is clear that corruption has happened during index manipulations: org.apache.ignite.internal.cache.{*}query.index.sorted.inline.InlineIndexTree{*}.

However it is not enough to narrow down the exact scenario, a reproducer is needed or at least persistence artifacts - partition files, index.bin file, WAL segments - that were on disk after node crash.

At the same time it should be possible to fix the issue by deleting index.bin file for the affected cache - in that case node should be able to start again. It will need to rebuild all secondary indexes that are stored in index.bin file, but restoring normal operations for that node should be possible in theory.

> B+Tree corruption error caused Ignite cluster crash and not able restart
> ------------------------------------------------------------------------
>
>                 Key: IGNITE-18715
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18715
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.14
>            Reporter: Isaac Zhu
>            Priority: Blocker
>
> With version 2.14, see this error during doing cache remove & put. And after this happens, the cluster can't be restarted, all data get lost:
> [00:48:21,922][SEVERE][sys-stripe-8-#9][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [groupId=828437433, pageIds=[217017202749409008], cacheId=-595580467, cacheName=SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS, indexName=URT_TESTCASE_RESULTS_VCS_STATUS_IDX, groupName=nav_mem_part, msg=Runtime failure on search row: Row@7b8be26c[ key: SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7_KEY [idHash=1221100184, hash=701595465, TEST_CASE_ID=610062, GROUP_ID=497], val: SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7 [idHash=1823360128, hash=-882680090, NEW=null, STATUS=MARK_FOR_DELETION, DIFF_FILE_ID=610, EXEC_TIME=67, TIMED_OUT=null, RECORD_TIME=2023-02-05 20:26:30.234, FILE_MOD_TIME=2023-02-02 01:35:59.0, SINCE=2022-10-22 00:00:00.0, JIRA=, ERROR_TOOL=null, ERROR_CODE=null, BUILD_DATE=2023-02-01 20:00:00.0, PRODUCER_START_TIME=2023-02-03 00:10:41.0] ][ MARK_FOR_DELETION, 497, 610062 ]]]] class org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [groupId=828437433, pageIds=[217017202749409008], cacheId=-595580467, cacheName=SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS, indexName=URT_TESTCASE_RESULTS_VCS_STATUS_IDX, groupName=nav_mem_part, msg=Runtime failure on search row: Row@7b8be26c[ key: SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7_KEY [idHash=1221100184, hash=701595465, TEST_CASE_ID=610062, GROUP_ID=497], val: SQL_PUBLIC_URT_TESTCASE_RESULTS_VCS_070eda86_0aab_4da3_900c_8d3baf08b3a7 [idHash=1823360128, hash=-882680090, NEW=null, STATUS=MARK_FOR_DELETION, DIFF_FILE_ID=610, EXEC_TIME=67, TIMED_OUT=null, RECORD_TIME=2023-02-05 20:26:30.234, FILE_MOD_TIME=2023-02-02 01:35:59.0, SINCE=2022-10-22 00:00:00.0, JIRA=, ERROR_TOOL=null, ERROR_CODE=null, BUILD_DATE=2023-02-01 20:00:00.0, PRODUCER_START_TIME=2023-02-03 00:10:41.0] ][ MARK_FOR_DELETION, 497, 610062 ]] at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.corruptedTreeException(InlineIndexTree.java:561) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2310) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removex(BPlusTree.java:2079) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexImpl.remove(InlineIndexImpl.java:377) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexImpl.onUpdate(InlineIndexImpl.java:330) at org.apache.ignite.internal.cache.query.index.IndexProcessor.updateIndex(IndexProcessor.java:465) at org.apache.ignite.internal.cache.query.index.IndexProcessor.updateIndexes(IndexProcessor.java:308) at org.apache.ignite.internal.cache.query.index.IndexProcessor.store(IndexProcessor.java:156) at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:2741) at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:420) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2629) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:2611) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.update(IgniteCacheOffheapManagerImpl.java:2510) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.update(GridCacheOffheapManager.java:2600) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.update(IgniteCacheOffheapManagerImpl.java:440) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyUpdate(GridCacheDatabaseSharedManager.java:2987) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$applyLogicalUpdates$29(GridCacheDatabaseSharedManager.java:2775) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.lambda$stripedApply$28(GridCacheDatabaseSharedManager.java:2455) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: java.lang.IllegalStateException: Item not found: 19 at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.createIndexRow(InlineIndexTree.java:360) at org.apache.ignite.internal.cache.query.index.sorted.inline.io.AbstractInlineLeafIO.getLookupRow(AbstractInlineLeafIO.java:129) at org.apache.ignite.internal.cache.query.index.sorted.inline.io.AbstractInlineLeafIO.getLookupRow(AbstractInlineLeafIO.java:37) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.getRow(InlineIndexTree.java:403) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.getRow(InlineIndexTree.java:72) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.getRow(BPlusTree.java:5693) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.compare(InlineIndexTree.java:309) at org.apache.ignite.internal.cache.query.index.sorted.inline.InlineIndexTree.compare(InlineIndexTree.java:72) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:5680) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5600) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1100(BPlusTree.java:162) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:369) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:6216) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:349) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:6202) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:174) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:415) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:6403) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2345) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:2364) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:2272) ... 19 more Caused by: java.lang.IllegalStateException: Item not found: 19 at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:488) at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:596) at org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:638) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380) at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316) ... 45 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)