You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Denis Chudov (Jira)" <ji...@apache.org> on 2021/08/12 12:18:00 UTC

[jira] [Updated] (IGNITE-15295) Server node that has an empty checkpoint file-XXX-START.bin does not start

     [ https://issues.apache.org/jira/browse/IGNITE-15295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Denis Chudov updated IGNITE-15295:
----------------------------------
    Ignite Flags:   (was: Docs Required,Release Notes Required)

> Server node that has an empty checkpoint file-XXX-START.bin does not start
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-15295
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15295
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>
> When starting a server node that has an empty checkpoint file-XXX-START.bin this node does not start.
> {code:java}
> 2021-06-08 16:00:33.383[ERROR][Thread-19][o.a.i.i.IgniteKernal%DPL_GRID%DplGridNodeName] Exception during start processors, node will be stopped and close connections
> 2java.nio.BufferUnderflowException: null
> 3        at java.nio.Buffer.nextGetIndex(Buffer.java:532)
> 4        at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:417)
> 5        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readPointer(CheckpointMarkersStorage.java:301)
> 6        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage.readCheckpointStatus(CheckpointMarkersStorage.java:218)
> 7        at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointManager.readCheckpointStatus(CheckpointManager.java:265)
> 8        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointStatus(GridCacheDatabaseSharedManager.java:1642)
> 9        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:584)
> 10        at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:2999)
> 11        at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1205)
> 12        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2105)
> 13        at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1768)
> 14        at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1147)
> 15        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:667)
> 16        at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:593)
> 17        at org.apache.ignite.Ignition.start(Ignition.java:319)
> 18        at com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:139)
> 19        at com.sbt.ignite.factory.IgniteFactory.getOrStartIgnite(IgniteFactory.java:91)
> 20        at com.sbt.ignite.manager.IgniteLifecycleManagerImpl.startIgnite(IgniteLifecycleManagerImpl.java:82)
> 21        at com.sbt.ignite.manager.IgniteLifecycleManagerImpl.init(IgniteLifecycleManagerImpl.java:73)
> 22        at com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.initIgniteServiceHolder(DPLManagerLifecycleManager.java:170)
> 23        at com.sbt.dpl.gridgain.container.DPLManagerLifecycleManager.dplContextInit(DPLManagerLifecycleManager.java:145)
> 24        at com.sbt.dpl.gridgain.container.ContainerDPLFactory.<init>(ContainerDPLFactory.java:80)
> 25        at com.sbt.dpl.gridgain.springsupport.SpringDPLFactory.init(SpringDPLFactory.java:74)
> {code}
> Checkpoint marker is always fully written in the temp file first, and then this file is renamed (see
> {noformat}
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#writeCheckpointEntry(java.nio.ByteBuffer, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntry, org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointEntryType, boolean){noformat}
> )
> So the root cause of this error is not clear, unless file was changed somehow. We need extended information if such error will happen in future, but in this case we have nothing for analysis (LFS was cleared by the customer right after this error happened).
> In the same time we can’t guarantee correctness of work when checkpoint markers are inconsistent. We can’t just ignore them, if they are broken, and can’t recover from previous checkpoint just as simple.
> But it seems reasonable to catch all reading-related exceptions in org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointMarkersStorage#readPointer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)