You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Kosarev (JIRA)" <ji...@apache.org> on 2018/08/16 14:21:00 UTC

[jira] [Commented] (IGNITE-9296) Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest

    [ https://issues.apache.org/jira/browse/IGNITE-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582595#comment-16582595 ] 

Sergey Kosarev commented on IGNITE-9296:
----------------------------------------

suggest fix: while(true) -> while(!isCancelled()) в 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.WALWriter#flushBuffer

>  Stopping node by Failure Handler hangs up in IgniteWalFlushBackgroundSelfTest
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9296
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9296
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Kosarev
>            Priority: Major
>         Attachments: logs.zip
>
>
> Here are log messages:
> [18:46:27]W:             [org.apache.ignite:ignite-core] [2018-08-15 15:46:27,442][ERROR][main][root] Test has been timed out and will be interrupted (threads dump will be taken before interruption) [test=testFailWhileStart, timeout=60000]
> And later on all the suite also hangs up:
> [22:22:49]E:     [Step 3/4] The build Ignite Tests 2.4+ (Java 8)::PDS 2 #2184 {buildId=1662285} has been running for more than 240 minutes. Terminating...
> Main thread locked by node-stopper:
> [18:46:27] :     [Step 3/4] Thread [name="test-runner-#7695%wal.IgniteWalFlushBackgroundSelfTest%", id=9150, state=BLOCKED, blockCnt=4, waitCnt=142]
> [18:46:27] :     [Step 3/4]     Lock [object=o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90, ownerName=node-stopper, ownerId=9267]
> [18:46:27] :     [Step 3/4]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2565)
> [18:46:27] :     [Step 3/4]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:27] :     [Step 3/4]         at o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:27] :     [Step 3/4]         at o.a.i.Ignition.stop(Ignition.java:229)
> [18:46:27] :     [Step 3/4]         at o.a.i.testframework.junits.GridAbstractTest.stopGrid(GridAbstractTest.java:1088)
> [18:46:27] :     [Step 3/4]         at o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1131)
> [18:46:27] :     [Step 3/4]         at o.a.i.testframework.junits.GridAbstractTest.stopAllGrids(GridAbstractTest.java:1109)
> [18:46:27] :     [Step 3/4]         at o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.failWhilePut(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:213)
> [18:46:27] :     [Step 3/4]         at o.a.i.i.processors.cache.persistence.db.wal.IgniteWalFlushMultiNodeFailoverAbstractSelfTest.testFailWhileStart(IgniteWalFlushMultiNodeFailoverAbstractSelfTest.java:147)
> node-stopper waits for the wal-segment-syncer stopping
> [18:46:28]W:             [org.apache.ignite:ignite-core] Thread [name="node-stopper", id=9267, state=WAITING, blockCnt=19, waitCnt=22]
> [18:46:28]W:             [org.apache.ignite:ignite-core]     Lock [object=java.lang.Object@5ba26eb0, ownerName=null, ownerId=-1]
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at java.lang.Object.wait(Native Method)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at java.lang.Object.wait(Object.java:502)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.util.worker.GridWorker.join(GridWorker.java:233)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.util.IgniteUtils.join(IgniteUtils.java:4692)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.shutdown(FileWriteAheadLogManager.java:3562)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WalSegmentSyncer.access$700(FileWriteAheadLogManager.java:3527)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.stop0(FileWriteAheadLogManager.java:578)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.GridCacheSharedManagerAdapter.stop(GridCacheSharedManagerAdapter.java:94)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.GridCacheProcessor.stop(GridCacheProcessor.java:951)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.IgniteKernal.stop0(IgniteKernal.java:2303)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.IgniteKernal.stop(IgniteKernal.java:2181)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2594)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         - locked o.a.i.i.IgnitionEx$IgniteNamedInstance@7c251f90
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2557)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.IgnitionEx.stop(IgnitionEx.java:374)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> wal-segment-syncer waits until wal-write-worker flushes data:
> [18:46:28]W:             [org.apache.ignite:ignite-core] Thread [name="wal-segment-syncer-#7782%wal.IgniteWalFlushBackgroundSelfTest1%", id=9253, state=RUNNABLE, blockCnt=0, waitCnt=860657904]
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at sun.misc.Unsafe.park(Native Method)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushBuffer(FileWriteAheadLogManager.java:3455)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.flushAll(FileWriteAheadLogManager.java:3419)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flush(FileWriteAheadLogManager.java:2704)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.flushOrWait(FileWriteAheadLogManager.java:2696)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.fsync(FileWriteAheadLogManager.java:2776)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1900(FileWriteAheadLogManager.java:2538)
> [18:46:28]W:             [org.apache.ignite:ignite-core]         at o.a.i.i.processors.cache.persistence.wal.FileWriteAheadLogManager.flush(FileWriteAheadLogManager.java:820)
> And there are no wal-write-worker on the node as he is already interrupted:
> [18:45:34]W:             [org.apache.ignite:ignite-core] [2018-08-15 15:45:34,132][ERROR][wal-write-worker%wal.IgniteWalFlushBackgroundSelfTest1-#7783%wal.IgniteWalFlushBackgroundSelfTest1%][IgniteTestRes
> ources] Critical system error detected. Will be handled accordingly to configured handler [hnd=class o.a.i.failure.StopNodeFailureHandler, failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.i
> .pagemem.wal.StorageException: Failed to write buffer.]]
> Caused by: java.io.IOException: No space left on device (This exception is generated intentionally by test logic)
> As we don't have wal-write-worker   wal-segment-syncer will be waiting for good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)