You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/08/04 11:42:00 UTC

[jira] [Commented] (IGNITE-15227) Improve diagnostic capabilities of persistence corruptions

    [ https://issues.apache.org/jira/browse/IGNITE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393057#comment-17393057 ] 

Ignite TC Bot commented on IGNITE-15227:
----------------------------------------

{panel:title=Branch: [pull/9292/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/9292/head] Base: [master] : New Tests (2)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}PDS 4{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=6115673]]
* {color:#013220}IgnitePdsTestSuite4: PagesPossibleCorruptionDiagnosticTest.testCorruptedNodeFailsOnStart - PASSED{color}
* {color:#013220}IgnitePdsTestSuite4: PagesPossibleCorruptionDiagnosticTest.testDiagnosticCollectedOnCorruptedPageList - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6115700&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Improve diagnostic capabilities of persistence corruptions
> ----------------------------------------------------------
>
>                 Key: IGNITE-15227
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15227
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are some diagnostic problems:
>  * assertions inside of PagesList can lead to CorruptedTreeException, which makes no sense. Example: 
> {code:java}
> 2020-11-30 20:17:27.170[ERROR]sys-stripe-29-#30%DPL_GRID%DplGridNodeName%[org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924, val2=72372732968376779]], groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey, msg=Runtime failure on search row: SearchRow [key=KeyCacheObject [hasValBytes=true], hash=513719283, cacheId=-295471981]]]]
> 2org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-782612924, val2=72372732968376779]], groupName=CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey, msg=Runtime failure on search row: SearchRow [key=KeyCacheObject [hasValBytes=true], hash=513719283, cacheId=-295471981]]
> 3at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6117)
> 4at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1937)
> 5at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1670)
> 6at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1653)
> 7at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2519)
> 8at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> 9at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4312)
> 10at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4289)
> 11at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1555)
> 12at org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:756)
> 13at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:794)
> 14at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:605)
> 15at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:477)
> 16at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:534)
> 17at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1092)
> 18at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:968)
> 19at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:923)
> 20at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:132)
> 21at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:229)
> 22at org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:227)
> 23at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> 24at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> 25at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> 26at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> 27at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> 28at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> 29at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1722)
> 30at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1329)
> 31at org.apache.ignite.internal.managers.communication.GridIoManager.access$4600(GridIoManager.java:158)
> 32at org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1214)
> 33at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:54)
> 34at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> 35at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> 36at java.lang.Thread.run(Thread.java:748)
> 37Caused by: java.lang.AssertionError: Incorrectly recycled pageId in reuse bucket: ff011e9e000012f7
> 38at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.takeEmptyPage(PagesList.java:1358)
> 39at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:517)
> 40at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:74)
> 41at org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeList.insertDataRow(CacheFreeList.java:35)
> 42at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:112)
> 43at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1720)
> 44at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2494)
> 45at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5876)
> 46at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:5813)
> 47at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:4000)
> 48at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(BPlusTree.java:3894)
> 49at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2020)
> 50at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 51at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> 52at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> {code}
>  * corruptions of partition meta also lead to mismatching exception type in pages list, e.g.:
> {code:java}
> 2021-01-29 05:48:41.644[ERROR][db-checkpoint-thread-#307%DPL_GRID%DplGridNodeName%][org.apache.ignite.Ignite] Critical system error detected. Will be handled accordingly to configured handler [
> 2hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failu
> 3reCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=java.lang.AssertionError: Missing tails [bucket=250, tails=null, metaPage=000120ca00002798]]]
> 4java.lang.AssertionError: Missing tails [bucket=250, tails=null, metaPage=000120ca00002798]
> 5        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.updateTail(PagesList.java:624)
> 6        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.mergeNoNext(PagesList.java:1628)
> 7        at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.removeDataPage(PagesList.java:1577)
> 8        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:318)
> 9        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList$RemoveRowHandler.run(AbstractFreeList.java:273)
> 10        at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:292)
> 11        at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:273)
> 12        at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.removeDataRowByLink(AbstractFreeList.java:633)
> 13        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:367)
> 14        at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.lambda$syncMetadata$2(GridCacheOffheapManager.java:288)
> 15        at org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11665)
> 16        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 17        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 18        at java.lang.Thread.run(Thread.java:748)
> {code}
> reproducer: [https://github.com/gridgain/apache-ignite/blob/2603e9a01bc1f6033b760ef02ebaba9a8069b84b/modules/core/src/test/java/org/apache/ignite/Reproducer12005.java]
> All such exceptions should be passed to DiagnosticProcessor and contain page ids that are possibly corrupted, to be able to abalyze them in PDS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)