You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Anton Kalashnikov (Jira)" <ji...@apache.org> on 2020/10/09 10:03:00 UTC
[jira] [Commented] (IGNITE-13565) Potential further bugs with DurableBackgroundTasks.

    [ https://issues.apache.org/jira/browse/IGNITE-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210785#comment-17210785 ] 

Anton Kalashnikov commented on IGNITE-13565:
--------------------------------------------

In my opinion, it is not a potential bug, it is already a bug. It looks like if DurableBackgroundTask is finished but status isn't updated it metastore, it leads to data corruption but finishing DurableBackgroundTask and changing status in metastore is not atomic operation so nobody can guarantee that node doesn't fail between these two actions. Perhaps, It needs to add some atomic operation for detection of finish the DurableBackgroundTask(maybe we should write something in WAL).

> Potential further bugs with DurableBackgroundTasks.
> ---------------------------------------------------
>
>                 Key: IGNITE-13565
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13565
>             Project: Ignite
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 2.8.1
>            Reporter: Stanilovsky Evgeny
>            Priority: Major
>
> After some code refactoring [1] we obtain a problem with simpe test: org.apache.ignite.internal.processors.cache.index.BasicIndexTest#testInlineSizeChange
> between 
> {noformat}
> execSql(cache, "drop index \"idx1\"");
> {noformat}
> and
> {noformat}
> ig0 = startGrid(0);
> {noformat}
> operations, seems [2] will fix it, but problem could potentially happen again (check attached stacks). In few words already completed durable task not updated 
> {noformat}
> DurableBackgroundTask#complete
> {noformat}
> status on metastore, thus after cluster running this task still can try to run once more with undefined behavior. [~Denis Chudov], [~makedonskaya] pay your attention plz.
> [1] https://issues.apache.org/jira/browse/IGNITE-13207
> [2] https://issues.apache.org/jira/browse/IGNITE-13500
> {noformat}
> 2020-10-09 11:42:41,982][INFO ][test-runner-#1%index.BasicIndexTest%][root] >>> Stopping grid [name=index.BasicIndexTest0, id=161e62a2-1a5d-46b0-892d-2e0274e00000]
> [2020-10-09 11:42:41,999][ERROR][db-checkpoint-thread-#61%index.BasicIndexTest0%][root] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteException: Failed to perform cache update: node is stopping.]]
> class org.apache.ignite.IgniteException: Failed to perform cache update: node is stopping.
> 	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:125)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1297)
> 	at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.removeDurableBackgroundTask(DurableBackgroundTasksProcessor.java:245)
> 	at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor.onMarkCheckpointBegin(DurableBackgroundTasksProcessor.java:277)
> 	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:274)
> 	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:387)
> 	at org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
> 	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> 	at java.lang.Thread.run(Thread.java:748)
> ...
> starting grid and ...
> java.lang.AssertionError: calculatedOffset=49152, allocated=45056, headerSize=4096, cfgFile=/work/repo/apache-ignite/work/db/index_BasicIndexTest0/cache-default/index.bin
> >>> +-------------------------------------------+
> >>> Ignite ver. 2.10.0-SNAPSHOT#20201009-sha1:DEV
> >>> +-------------------------------------------+
> 	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:492)
> 	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:554)
> 	at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:538)
> 	at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:884)
> 	at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:710)
> 	at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:699)
> 	at org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:158)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.acquirePage(BPlusTree.java:6037)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getMetaInfo(H2Tree.java:415)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.<init>(H2Tree.java:241)
> 	at org.apache.ignite.internal.processors.query.h2.DurableBackgroundCleanupIndexTreeTask.execute(DurableBackgroundCleanupIndexTreeTask.java:140)
> 	at org.apache.ignite.internal.processors.localtask.DurableBackgroundTasksProcessor$1.body(DurableBackgroundTasksProcessor.java:99)
> 	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> 	at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)