You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Maxim Muzafarov (JIRA)" <ji...@apache.org> on 2019/03/26 09:05:00 UTC

[jira] [Updated] (IGNITE-9040) StopNodeFailureHandler is not able to stop node correctly on node segmentation

     [ https://issues.apache.org/jira/browse/IGNITE-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Muzafarov updated IGNITE-9040:
------------------------------------
    Labels: iep-14  (was: )

> StopNodeFailureHandler is not able to stop node correctly on node segmentation
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-9040
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9040
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.6
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Major
>              Labels: iep-14
>             Fix For: 2.7
>
>
> When flag *IGNITE_WAL_LOG_TX_RECORDS* is set up special TxRecords are logged to WAL even on node stop.
> With STOP segmentation policy *StopNodeFailureHandler* is used to stop the segmented node and it marks node's state as invalid. As a result all write requests to WAL get failed.
> So as part of stop-on-segmentation procedure node needs to log Tx but it cannot as its state is marked as invalid. This leads to stop procedure finishing incorrectly, some threads started by the node are not cleaned up.
> Exception example:
> {noformat}
> [2018-07-20 13:35:36,358][ERROR][node-stopper][ZookeeperDiscoverySpiTest0] Failed to pre-stop processor: GridProcessorAdapter []
> class org.apache.ignite.IgniteException: Failed to log TxRecord: TxRecord [state=PREPARED, nearXidVer=GridCacheVersion [topVer=143562918, order=1532082921780, nodeOrder=3], writeVer=GridCacheVersion [topVer=143562918, order=1532082921781, nodeOrder=1], super=TimeStampRecord [timestamp=1532082936349]]
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1132)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:968)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onComplete(GridDhtTxPrepareFuture.java:983)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:717)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onDone(GridDhtTxPrepareFuture.java:105)
> 	at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
> 	at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:425)
> 	at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onStop(GridCacheMvccManager.java:410)
> 	at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStop(GridCacheProcessor.java:984)
> 	at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:2134)
> 	at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:2082)
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2595)
> 	at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2558)
> 	at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:374)
> 	at org.apache.ignite.failure.StopNodeFailureHandler$1.run(StopNodeFailureHandler.java:36)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: class org.apache.ignite.internal.pagemem.wal.StorageException: Failed to perform WAL operation (environment was invalidated by a previous error)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.checkNode(FileWriteAheadLogManager.java:1504)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.access$6100(FileWriteAheadLogManager.java:143)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.addRecord(FileWriteAheadLogManager.java:2611)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileWriteHandle.access$1500(FileWriteAheadLogManager.java:2521)
> 	at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.log(FileWriteAheadLogManager.java:758)
> 	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.state(IgniteTxAdapter.java:1127)
> 	... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)