You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/11 17:52:00 UTC

[jira] [Commented] (IGNITE-8320) Page corruption during the rebalancing cache.

    [ https://issues.apache.org/jira/browse/IGNITE-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472351#comment-16472351 ] 

ASF GitHub Bot commented on IGNITE-8320:
----------------------------------------

GitHub user Jokser opened a pull request:

    https://github.com/apache/ignite/pull/3985

    IGNITE-8320 Corrupted indexes fix

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gridgain/apache-ignite ignite-8320-reproduce

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ignite/pull/3985.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3985
    
----
commit ffb362929decc431b325bccc8c612a049f85063f
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T15:32:32Z

    IGNITE-8320 Reproducer.

commit 37765277286a18198255bcbc2286073706ef6048
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T15:33:42Z

    IGNITE-8320 Reproducer.

commit d1d265ae98ab79d6d80a667e4a844ea86f724e32
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T15:34:47Z

    IGNITE-8320 Docs fix.

commit 234b1f8fcf24d849227e5e73e26fb81e0768cf21
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T15:36:01Z

    IGNITE-8320 Docs fix.

commit 951d67e93677358470416a5faabe238b6e2bb21a
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T16:46:15Z

    IGNITE-8320 Fix WIP.

commit a1acab629dfce81e904bdc6fac92458b60a7ac48
Author: Pavel Kovalenko <jo...@...>
Date:   2018-05-11T17:51:00Z

    IGNITE-8320 Fix WIP.

----


> Page corruption during the rebalancing cache.
> ---------------------------------------------
>
>                 Key: IGNITE-8320
>                 URL: https://issues.apache.org/jira/browse/IGNITE-8320
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>    Affects Versions: 2.4
>            Reporter: Vyacheslav Koptilin
>            Assignee: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.6
>
>
> Cache rebalance may result in page memory corruption.
> {noformat}
> [2018-04-18T14:33:23,260][ERROR][sys-#54][GridCacheIoManager] Failed processing message [senderId=95f06c25-e6bb-48f7-a3e5-4c05fc1c49be, msg=GridDhtPartitionSupplyMessage [rebalanceId=37, topVer=AffinityTopologyVersion [topVer=53, minorTopVer=1], missed=null, clean=null, msgSize=525350, estimatedKeysCnt=1690216, size=2, parts=[1, 2], super=GridCacheGroupIdMessage [grpId=-1831596270]]]
>  org.apache.ignite.IgniteException: Runtime failure on row: Row@33b6805c[ key: xxxx [idHash=773709078, hash=-630455542, ...], val: xxxx [idHash=1309051286, hash=-1321165334, ver: GridCacheVersion [topVer=135435024, order=1523963943331, nodeOrder=4] ]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2102) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putx(BPlusTree.java:2049) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:247) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:454) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.store(IgniteH2Indexing.java:653) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.GridQueryProcessor.store(GridQueryProcessor.java:1866) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.store(GridCacheQueryManager.java:407) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishUpdate(IgniteCacheOffheapManagerImpl.java:1391) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1255) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1451) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:352) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3527) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2735) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:823) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:704) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:347) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:365) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:355) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1054) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:99) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1603) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:126) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2751) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1515) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:126) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1484) [ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
>  Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted)
>  at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2RowFactory.getRow(H2RowFactory.java:61) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2Tree.createRowFromLink(H2Tree.java:149) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.io.H2LeafIO.getLookupRow(H2LeafIO.java:67) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.io.H2LeafIO.getLookupRow(H2LeafIO.java:33) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:167) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:46) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.getRow(BPlusTree.java:4436) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:209) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:46) ~[ignite-indexing-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:4423) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:4343) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1500(BPlusTree.java:82) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:270) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4770) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4755) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:158) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:320) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2317) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.putDown(BPlusTree.java:2329) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doPut(BPlusTree.java:2069) ~[ignite-core-2.4.4.b1.jar:2.4.4.b1]
>  ... 30 more
> {noformat}
> Possible cause and reproducer:
> 1) Start partition eviction
> 2) Force kill node (kill -9) after partition file truncate
> 3) Start node again and iterate over index
> The main problem that file truncation is not synchronized with actual checkpoint which can lead to the situation, that after crash recovery we have links in index tree to the data pages which were already removed during file truncation.
> One of the possible solutions is to mark such partition files for deletion and safely truncate them on the next checkpoint.
> This mechanism can be ressurected from ignite-2.0.2.b1 branch.
> See 
> {noformat}
> org/gridgain/grid/internal/processors/cache/database/GridCacheDatabaseSharedManager.java:3059
> org.gridgain.grid.cache.db.GridCacheOffheapManager#destroyCacheDataStore
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)