You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Ray <ra...@cisco.com> on 2018/11/29 04:57:42 UTC

RE: Failed to get page IO instance (page content is corrupted) after onenode failed when trying to reboot.

This issue happened again.

Here's the summary.
I'm running a three nodes of Ignite 2.6 cluster with these config

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">

    <bean id="grid.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="segmentationPolicy" value="RESTART_JVM"/>
        <property name="peerClassLoadingEnabled" value="true"/>
        <property name="failureDetectionTimeout" value="60000"/>
        <property name="dataStorageConfiguration">
            <bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
            <property name="storagePath" value="/data/ignite/persistence"/>
            <property name="walPath" value="/wal"/>
            <property name="walArchivePath" value="/wal/archive"/>
            <property name="defaultDataRegionConfiguration">
                <bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
                    <property name="name" value="default_Region"/>
                    <property name="initialSize" value="#{100L * 1024 * 1024
* 1024}"/>
                    <property name="maxSize" value="#{400L * 1024 * 1024 *
1024}"/>
                    <property name="persistenceEnabled" value="true"/>
                    <property name="checkpointPageBufferSize" value="#{8L *
1024 * 1024 * 1024}"/>
                </bean>
            </property>
            <property name="walMode" value="BACKGROUND"/>
            <property name="walFlushFrequency" value="5000"/>
            <property name="checkpointFrequency" value="600000"/>
            </bean>
        </property>
        <property name="discoverySpi">
                <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                    <property name="localPort" value="49500"/>
                    <property name="ipFinder">
                        <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                            <property name="addresses">
                                <list>
                                <value>node1:49500</value>
                                <value>node2:49500</value>
                                <value>node3:49500</value>
                                </list>
                            </property>
                        </bean>
                    </property>
                </bean>
            </property>
            <property name="gridLogger">
            <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
                <constructor-arg type="java.lang.String"
value="config/ignite-log4j2.xml"/>
            </bean>
        </property>
    </bean>
</beans>

I have a few caches setup with TTL with enabled persistence.
Why I'm mentioning this because I check this thread
http://apache-ignite-users.70518.x6.nabble.com/And-again-Failed-to-get-page-IO-instance-page-content-is-corrupted-td20095.html#a22037
and a few tickets mentioned in this ticket.
https://issues.apache.org/jira/browse/IGNITE-8659
https://issues.apache.org/jira/browse/IGNITE-5874
Other issues is ignored because they're already fixed in 2.6


Node1 goes down because of a long GC pause.
When I try to restart Ignite service on Node1, I got "Still waiting for
initial partition map exchange" warning log going on for more than 2 hours. 
[WARN ][main][GridCachePartitionExchangeManager] Still waiting for initial
partition map exchange [fut=GridDhtPartitionsExchangeFuture
[firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a, addrs=[10.252.4.60, 127.0.0.1],
sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
crd=TcpDiscoveryNode [id=f14c8e36-9a20-4668-b52e-0de64c743700,
addrs=[10.252.10.20, 127.0.0.1],
sockAddrs=[rpsj1ign003.webex.com/10.252.10.20:49500, /127.0.0.1:49500],
discPort=49500, order=2310, intOrder=1158, lastExchangeTime=1543451942304,
loc=false, ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=11813, minorTopVer=0], discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a,
addrs=[10.252.4.60, 127.0.0.1],
sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
nodeId=9d66b750, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter
[ignoreInterrupts=false, state=INIT, res=null, hash=830022440], init=false,
lastVer=null, partReleaseFut=PartitionReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion
[topVer=11813, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
minorTopVer=0], futures=[]], LocalTxReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
AllTxReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
minorTopVer=0], futures=[RemoteTxReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
futures=[]]]]]], exchActions=ExchangeActions [startCaches=null,
stopCaches=null, startGrps=[], stopGrps=[], resetParts=null,
stateChangeRequest=null], affChangeMsg=null, initTs=1543451943112,
centralizedAff=false, forceAffReassignment=false, changeGlobalStateE=null,
done=false, state=SRV, evtLatch=0,
remaining=[0126e998-0c18-452f-8f3b-b6dd4b2ae84c,
f14c8e36-9a20-4668-b52e-0de64c743700], super=GridFutureAdapter
[ignoreInterrupts=false, state=INIT, res=null, hash=773110813]]]

So I try to reboot Ignite service on node2 and node3.
But only node2 manages to join the cluster, node3 prints "Still waiting for
initial partition map exchange" for more than 30 minutes.

So I stopped all three nodes, and restarted the Ignite service on them.
Then I got Failed to get page IO instance (page content is corrupted) on
Node1.

[ERROR][exchange-worker-#162][] Critical system error detected. Will be
handled accordingly to configured handler [hnd=class
o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
[type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Failed to get
page IO instance (page content is corrupted)]]
java.lang.IllegalStateException: Failed to get page IO instance (page
content is corrupted)
        at
org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:175)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.<init>(AbstractFreeList.java:370)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.<init>(CacheFreeListImpl.java:47)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.<init>(GridCacheOffheapManager.java:1203)
~[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1203)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.updateCounter(GridCacheOffheapManager.java:1420)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.updateCounter(GridDhtLocalPartition.java:942)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.<init>(GridDhtLocalPartition.java:222)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.getOrCreatePartition(GridDhtPartitionTopologyImpl.java:812)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:368)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:543)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1141)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:712)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
[ignite-core-2.6.0.jar:2.6.0]
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[ignite-core-2.6.0.jar:2.6.0]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2018-11-29T03:53:25,629][ERROR][exchange-worker-#162][] JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Failed to get
page IO instance (page content is corrupted)]]

Here's the full log file.
node1.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.zip>  
node2.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.zip>  
node3.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.zip>  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to get page IO instance (page content is corrupted) after onenode failed when trying to reboot.

Posted by "Ray Liu (rayliu)" <ra...@cisco.com>.

Here's my analysis

Looks like I encountered this bug
https://issues.apache.org/jira/browse/IGNITE-8659
Because in log file ignite-9d66b750-first-restart-node1.log, I see
[2018-11-29T03:01:39,135][INFO ][exchange-worker-#162][GridCachePartitionExchangeManager] Rebalancing started [top=AffinityTopologyVersion [topVer=11834, minorTopVer=0], evt=NODE_JOINED, node=6018393e-a88c-40f5-8d77-d136d5226741]
[2018-11-29T03:01:39,136][INFO ][exchange-worker-#162][GridDhtPartitionDemander] Starting rebalancing [grp=SQL_PUBLIC_WBXSITEACCOUNT, mode=ASYNC, fromNode=6018393e-a88c-40f5-8d77-d136d5226741, partitionsCount=345, topology=AffinityTopologyVersion [topVer=11834, minorTopVer=0], rebalanceId=47]

But why did rebalance started after two hours after the node started?
Is it because PME got stuck for two hours?

Also it looks like the PME got stuck again when rebalance started (This is when I restarted node2 and node3).
Because in the same log file, I see
[2018-11-29T03:01:59,443][WARN ][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await partitions release latch within timeout: ServerLatch [permits=2, pendingAcks=[6018393e-a88c-40f5-8d77-d136d5226741, 75a180ea-78de-4d63-8bd5-291557bd58f4], super=CompletableLatch [id=exchange, topVer=AffinityTopologyVersion [topVer=11835, minorTopVer=0]]]

Based on this document https://apacheignite.readme.io/docs/rebalancing#section-rebalance-modes, the rebalance is async by default.
So what is block PME this time?

So basically I have three questions.
1. Why node1 can't join cluster("Still waiting for initial partition map exchange" for two hours) when restarted?
Is it because node2 and node3 have some newly ingested data when node1 is down?

2. Why is node3 blocked by " Unable to await partitions release latch within timeout " when restarted?

3. Is https://issues.apache.org/jira/browse/IGNITE-8659 the solution?

Andrew, can you take a look please?
I think it's a critical problem because the only way to get node1 working is to delete data and wal folder.
No need to say, it will cause data loss.

Thanks

Ray wrote:

    This issue happened again.
    
    Here's the summary.
    I'm running a three nodes of Ignite 2.6 cluster with these config
    
    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="
           http://www.springframework.org/schema/beans
           http://www.springframework.org/schema/beans/spring-beans.xsd">
    
        <bean id="grid.cfg"
    class="org.apache.ignite.configuration.IgniteConfiguration">
            <property name="segmentationPolicy" value="RESTART_JVM"/>
            <property name="peerClassLoadingEnabled" value="true"/>
            <property name="failureDetectionTimeout" value="60000"/>
            <property name="dataStorageConfiguration">
                <bean
    class="org.apache.ignite.configuration.DataStorageConfiguration">
                <property name="storagePath" value="/data/ignite/persistence"/>
                <property name="walPath" value="/wal"/>
                <property name="walArchivePath" value="/wal/archive"/>
                <property name="defaultDataRegionConfiguration">
                    <bean
    class="org.apache.ignite.configuration.DataRegionConfiguration">
                        <property name="name" value="default_Region"/>
                        <property name="initialSize" value="#{100L * 1024 * 1024
    * 1024}"/>
                        <property name="maxSize" value="#{400L * 1024 * 1024 *
    1024}"/>
                        <property name="persistenceEnabled" value="true"/>
                        <property name="checkpointPageBufferSize" value="#{8L *
    1024 * 1024 * 1024}"/>
                    </bean>
                </property>
                <property name="walMode" value="BACKGROUND"/>
                <property name="walFlushFrequency" value="5000"/>
                <property name="checkpointFrequency" value="600000"/>
                </bean>
            </property>
            <property name="discoverySpi">
                    <bean
    class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                        <property name="localPort" value="49500"/>
                        <property name="ipFinder">
                            <bean
    class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                                <property name="addresses">
                                    <list>
                                    <value>node1:49500</value>
                                    <value>node2:49500</value>
                                    <value>node3:49500</value>
                                    </list>
                                </property>
                            </bean>
                        </property>
                    </bean>
                </property>
                <property name="gridLogger">
                <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
                    <constructor-arg type="java.lang.String"
    value="config/ignite-log4j2.xml"/>
                </bean>
            </property>
        </bean>
    </beans>
    
    I have a few caches setup with TTL with enabled persistence.
    Why I'm mentioning this because I check this thread
    http://apache-ignite-users.70518.x6.nabble.com/And-again-Failed-to-get-page-IO-instance-page-content-is-corrupted-td20095.html#a22037
    and a few tickets mentioned in this ticket.
    https://issues.apache.org/jira/browse/IGNITE-8659
    https://issues.apache.org/jira/browse/IGNITE-5874
    Other issues is ignored because they're already fixed in 2.6
    
    
    Node1 goes down because of a long GC pause.
    When I try to restart Ignite service on Node1, I got "Still waiting for
    initial partition map exchange" warning log going on for more than 2 hours. 
    [WARN ][main][GridCachePartitionExchangeManager] Still waiting for initial
    partition map exchange [fut=GridDhtPartitionsExchangeFuture
    [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
    [id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a, addrs=[10.252.4.60, 127.0.0.1],
    sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
    discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
    loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
    nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
    crd=TcpDiscoveryNode [id=f14c8e36-9a20-4668-b52e-0de64c743700,
    addrs=[10.252.10.20, 127.0.0.1],
    sockAddrs=[rpsj1ign003.webex.com/10.252.10.20:49500, /127.0.0.1:49500],
    discPort=49500, order=2310, intOrder=1158, lastExchangeTime=1543451942304,
    loc=false, ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
    exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
    [topVer=11813, minorTopVer=0], discoEvt=DiscoveryEvent
    [evtNode=TcpDiscoveryNode [id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a,
    addrs=[10.252.4.60, 127.0.0.1],
    sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
    discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
    loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
    nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
    nodeId=9d66b750, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter
    [ignoreInterrupts=false, state=INIT, res=null, hash=830022440], init=false,
    lastVer=null, partReleaseFut=PartitionReleaseFuture
    [topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
    futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion
    [topVer=11813, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture
    [topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
    DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
    minorTopVer=0], futures=[]], LocalTxReleaseFuture
    [topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
    AllTxReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
    minorTopVer=0], futures=[RemoteTxReleaseFuture
    [topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
    futures=[]]]]]], exchActions=ExchangeActions [startCaches=null,
    stopCaches=null, startGrps=[], stopGrps=[], resetParts=null,
    stateChangeRequest=null], affChangeMsg=null, initTs=1543451943112,
    centralizedAff=false, forceAffReassignment=false, changeGlobalStateE=null,
    done=false, state=SRV, evtLatch=0,
    remaining=[0126e998-0c18-452f-8f3b-b6dd4b2ae84c,
    f14c8e36-9a20-4668-b52e-0de64c743700], super=GridFutureAdapter
    [ignoreInterrupts=false, state=INIT, res=null, hash=773110813]]]
    
    So I try to reboot Ignite service on node2 and node3.
    But only node2 manages to join the cluster, node3 prints "Still waiting for
    initial partition map exchange" for more than 30 minutes.
    
    So I stopped all three nodes, and restarted the Ignite service on them.
    Then I got Failed to get page IO instance (page content is corrupted) on
    Node1.
    
    [ERROR][exchange-worker-#162][] Critical system error detected. Will be
    handled accordingly to configured handler [hnd=class
    o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
    [type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Failed to get
    page IO instance (page content is corrupted)]]
    java.lang.IllegalStateException: Failed to get page IO instance (page
    content is corrupted)
            at
    org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.init(PagesList.java:175)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.<init>(AbstractFreeList.java:370)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.freelist.CacheFreeListImpl.<init>(CacheFreeListImpl.java:47)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore$1.<init>(GridCacheOffheapManager.java:1203)
    ~[ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1203)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.updateCounter(GridCacheOffheapManager.java:1420)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.updateCounter(GridDhtLocalPartition.java:942)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.<init>(GridDhtLocalPartition.java:222)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.getOrCreatePartition(GridDhtPartitionTopologyImpl.java:812)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.initPartitions(GridDhtPartitionTopologyImpl.java:368)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.beforeExchange(GridDhtPartitionTopologyImpl.java:543)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1141)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:712)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
    [ignite-core-2.6.0.jar:2.6.0]
            at
    org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
    [ignite-core-2.6.0.jar:2.6.0]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
    [2018-11-29T03:53:25,629][ERROR][exchange-worker-#162][] JVM will be halted
    immediately due to the failure: [failureCtx=FailureContext
    [type=CRITICAL_ERROR, err=java.lang.IllegalStateException: Failed to get
    page IO instance (page content is corrupted)]]
    
    Here's the full log file.
    node1.zip
    <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node1.zip>  
    node2.zip
    <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node2.zip>  
    node3.zip
    <http://apache-ignite-users.70518.x6.nabble.com/file/t1346/node3.zip>  
    
    
    
    
    
    --
    Sent from: http://apache-ignite-users.70518.x6.nabble.com/