You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ignite TC Bot (Jira)" <ji...@apache.org> on 2021/08/23 16:02:00 UTC

[jira] [Commented] (IGNITE-15295) Server node that has an empty checkpoint file-XXX-START.bin does not start

    [ https://issues.apache.org/jira/browse/IGNITE-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403254#comment-17403254 ] 

Ignite TC Bot commented on IGNITE-15295:
----------------------------------------

{panel:title=Branch: [pull/9325/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/9325/head] Base: [master] : New Tests (1)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#00008b}PDS 2{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=6131505]]
* {color:#013220}IgnitePdsTestSuite2: CheckpointMarkerReadingErrorOnStartTest.test - PASSED{color}

{panel}
[TeamCity *--&gt; Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=6131534&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Server node that has an empty checkpoint file-XXX-START.bin does not start
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-15295
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15295
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When starting a server node that has an empty checkpoint file-XXX-START.bin this node does not start.
> {code:java}
> 2021-06-08 16:00:33.383[ERROR][Thread-19][o.a.i.i.IgniteKernal%DPL_GRID%DplGridNodeName] Exception during start processors, node will be stopped and close connections
> 2java.nio.BufferUnderflowException: null
> 3        at java.nio.Buffer.nextGetIndex(Buffer.java:532)
> 4        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:417)
> 5        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readPointer(CheckpointMarkersStorage.java:301)
> 6        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readCheckpointStatus(CheckpointMarkersStorage.java:218)
> 7        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.readCheckpointStatus(CheckpointManager.java:265)
> 8        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointStatus(GridCacheDatabaseSharedManager.java:1642)
> 9        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:584)
> 10        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:2999)
> 11        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1205)
> 12        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2105)
> 13        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1768)
> 14        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1147)
> 15        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:667)
> 16        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:593)
> 17        at org.apache.ignite.Ignition.start(Ignition.java:319)
> 18        at com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:139)
> 19        at com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:91)
> 20        at com.sbt.ignite.manager.IgniteLifecycleManagerImpl.startIgnite(IgniteLifecycleManagerImpl.java:82)
> 21        at com.sbt.ignite.manager.IgniteLifecycleManagerImpl.init(IgniteLifecycleManagerImpl.java:73)
> 22        at com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.initIgniteServiceHolder(DPLManagerLifecycleManager.java:170)
> 23        at com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.dplContextInit(DPLManagerLifecycleManager.java:145)
> 24        at com.sbt.dpl.gridgain.container.ContainerDPLFactory.<init>(ContainerDPLFactory.java:80)
> 25        at com.sbt.dpl.gridgain.springsupport.SpringDPLFactory.init(SpringDPLFactory.java:74)
> {code}
> Checkpoint marker is always fully written in the temp file first, and then this file is renamed (see
> {noformat}
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#writeCheckpointEntry(java.nio.ByteBuffer, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntryType, boolean){noformat}
> )
> So the root cause of this error is not clear, unless file was changed somehow. We need extended information if such error will happen in future, but in this case we have nothing for analysis (LFS was cleared by the customer right after this error happened).
> In the same time we can’t guarantee correctness of work when checkpoint markers are inconsistent. We can’t just ignore them, if they are broken, and can’t recover from previous checkpoint just as simple.
> But it seems reasonable to catch all reading-related exceptions in org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#readPointer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)