You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Kukushkin (Jira)" <ji...@apache.org> on 2020/04/02 06:14:00 UTC

[jira] [Comment Edited] (IGNITE-12828) Intermittent [Failed to notify direct custom event listener] exception on node shutdown

    [ https://issues.apache.org/jira/browse/IGNITE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073403#comment-17073403 ] 

Alexey Kukushkin edited comment on IGNITE-12828 at 4/2/20, 6:13 AM:
--------------------------------------------------------------------

The problem is due to lack of synchronization between the node shutdown and continuous query handler initialization: * The problem occurred due to [NPE here|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/StartRoutineDiscoveryMessage.java#L95], which is [called from continuous query handler initialization|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/GridContinuousProcessor.java#L219]
 * cntrs is set to null on the node shutdown
 * There is no reliable synchronization on the node shutdown. There are GridKernalContext#isStopping checks spread all over the code to detect whether the node is shutting down.
 * [Here|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/GridContinuousProcessor.java#L216] is the last node shutdown check on continuous query handler initialization. But there are time-consuming things happening  between that check and the NPE from this problem like [deploying classes|#L1412].] The NPE occurs if the node shutdown started in parallel.


was (Author: kukushal):
The problem is due to lack of synchronization between the node shutdown and continuous query handler initialization: * The problem occurred due to [NPE here|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/StartRoutineDiscoveryMessage.java#L95]], which is [called from continuous query handler initialization|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/GridContinuousProcessor.java#L219]]
 * cntrs is set to null on the node shutdown
 * There is no reliable synchronization on the node shutdown. There are GridKernalContext#isStopping checks spread all over the code to detect whether the node is shutting down.
 * [Here|https://github.com/apache/ignite/blob/341b01dfd8abf2d9b01d468ad1bb26dfe84ac4f6/modules/core/src/main/java/org/apache/ignite/internal/processors/continuous/GridContinuousProcessor.java#L216] is the last node shutdown check on continuous query handler initialization. But there are time-consuming things happening  between that check and the NPE from this problem like [deploying classes|#L1412].] The NPE occurs if the node shutdown started in parallel.

> Intermittent [Failed to notify direct custom event listener] exception on node shutdown
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12828
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12828
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.8
>            Reporter: Alexey Kukushkin
>            Assignee: PetrovMikhail
>            Priority: Major
>              Labels: sbcf
>         Attachments: ignite-12828-vs-2.8.patch
>
>
> +*Reproducer*+:
> Run a server node
> Run a client node that:
>  * Creates cache "cache1"
>  * Deploys a grid service that starts a continuous query against "cache1" in method init()
>  * Leaves the cluster
> +*Actual result*+
> Intermittent exception in the client node:
> {noformat}
> [16:54:38,758][SEVERE][disco-notifier-worker-#43%CashFlowCluster_16b67e98563f4cfbac95ae055a00e67f%][GridDiscoveryManager] Failed to notify direct custom event listener: StartRoutineDiscoveryMessage [startReqData=StartRequestData [prjPred=sbt.cashflow.grid.services.cachefactory.ignite.NodeAttributeFilter@63ae71a9, clsName=null, depInfo=null, hnd=CacheContinuousQueryHandler [returnValTrans=o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$1@594bf5b8, cacheName=CALC_REQUESTS, rmtFilter=null, rmtFilterDep=null, internal=false, notifyExisting=false, oldValRequired=true, sync=false, ignoreExpired=true, taskHash=0, skipPrimaryCheck=false, locOnly=false, keepBinary=true, ackBuf=null, cacheId=-1608655250, initTopVer=null, nodeLeft=false, ignoreClsNotFound=false, nodeId=null, routineId=null], bufSize=1, interval=0, autoUnsubscribe=true], keepBinary=true, routineId=021dd2ce-3d8a-41c1-a4d0-b625ea1284f4]
> java.lang.NullPointerException
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:82)
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:96)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1424)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:110)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:202)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:193)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:722)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2683)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2721)
>  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>  at java.lang.Thread.run(Thread.java:745)
> [16:54:39,725][SEVERE][disco-notifier-worker-#43%CashFlowCluster_16b67e98563f4cfbac95ae055a00e67f%][GridDiscoveryManager] Failed to notify direct custom event listener: StartRoutineDiscoveryMessage [startReqData=StartRequestData [prjPred=sbt.cashflow.grid.services.cachefactory.ignite.NodeAttributeFilter@7462c96c, clsName=null, depInfo=null, hnd=CacheContinuousQueryHandler [returnValTrans=o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$1@6451dd70, cacheName=DISTRIBUTED_REQUESTS, rmtFilter=null, rmtFilterDep=null, internal=false, notifyExisting=false, oldValRequired=true, sync=false, ignoreExpired=true, taskHash=0, skipPrimaryCheck=false, locOnly=false, keepBinary=true, ackBuf=null, cacheId=1419803136, initTopVer=null, nodeLeft=false, ignoreClsNotFound=false, nodeId=null, routineId=null], bufSize=1, interval=0, autoUnsubscribe=true], keepBinary=true, routineId=1fca5f04-d220-49ac-850a-0d4527e22eef]
> java.lang.NullPointerException
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:82)
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:96)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1424)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:110)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:202)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:193)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:722)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2683)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2721)
>  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>  at java.lang.Thread.run(Thread.java:745)
> [16:54:40,809][SEVERE][disco-notifier-worker-#43%CashFlowCluster_16b67e98563f4cfbac95ae055a00e67f%][GridDiscoveryManager] Failed to notify direct custom event listener: StartRoutineDiscoveryMessage [startReqData=StartRequestData [prjPred=sbt.cashflow.grid.services.cachefactory.ignite.NodeAttributeFilter@4a29e4c8, clsName=null, depInfo=null, hnd=CacheContinuousQueryHandler [returnValTrans=o.a.i.i.processors.cache.query.continuous.CacheContinuousQueryHandler$1@28627d48, cacheName=DISTRIBUTED_REQUESTS, rmtFilter=null, rmtFilterDep=null, internal=false, notifyExisting=false, oldValRequired=true, sync=false, ignoreExpired=true, taskHash=0, skipPrimaryCheck=false, locOnly=false, keepBinary=true, ackBuf=null, cacheId=1419803136, initTopVer=null, nodeLeft=false, ignoreClsNotFound=false, nodeId=null, routineId=null], bufSize=1, interval=0, autoUnsubscribe=true], keepBinary=true, routineId=aa0bdf4f-bfdb-4eb3-8d99-6bcb67532704]
> java.lang.NullPointerException
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:82)
>  at org.apache.ignite.internal.processors.continuous.StartRoutineDiscoveryMessage.addUpdateCounters(StartRoutineDiscoveryMessage.java:96)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1424)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:110)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:202)
>  at org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:193)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:722)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:601)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2683)
>  at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2721)
>  at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)