You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Goncharuk (Jira)" <ji...@apache.org> on 2020/04/02 11:29:00 UTC
[jira] [Commented] (IGNITE-12760) Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left

    [ https://issues.apache.org/jira/browse/IGNITE-12760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073641#comment-17073641 ] 

Alexey Goncharuk commented on IGNITE-12760:
-------------------------------------------

[~Denis Chudov], you need to correct the exception and console messages. All messages (log and exception) in Ignite should adhere to the format "Message (optional verbose description) [arg1=val1, arg2=val2, arg3=val3]". Also, it's better to have error messages to start with "Failed ..." (you used "Couldn't").

> Prevent AssertionError on message unmarshalling, when classLoaderId contains id of node that already left
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12760
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12760
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Assignee: Denis Chudov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Following assertion error triggers failure handler and crashes the node. Can possibly crash the whole cluster.
> {code:java}
> 2020-02-18 14:34:09.775\[ERROR]\[query-#146129%DPL_GRID%DplGridNodeName%]\[o.a.i.i.p.cache.GridCacheIoManager] Failed to process message \[senderId=727757ed-4ad4-4779-bda9-081525725cce, msg=GridCacheQueryRequest \[id=178, cacheName=com.sbt.tokenization.data.entity.KEKEntity_DPL_union-module, type=SCAN, fields=false, clause=null, clsName=null, keyValFilter=null, rdc=null, trans=null, pageSize=1024, incBackups=false, cancel=false, incMeta=false, all=false, keepBinary=true, subjId=727757ed-4ad4-4779-bda9-081525725cce, taskHash=0, part=-1, topVer=AffinityTopologyVersion \[topVer=97, minorTopVer=0], sendTimestamp=-1, receiveTimestamp=-1, super=GridCacheIdMessage \[cacheId=-1129073400, super=GridCacheMessage \[msgId=179, depInfo=GridDeploymentInfoBean \[clsLdrId=c32670e3071-d30ee64b-0833-45d4-abbe-fb6282669caa, depMode=SHARED, userVer=0, locDepOwner=false, participants=null], lastAffChangedTopVer=AffinityTopologyVersion \[topVer=8, minorTopVer=6], err=null, skipPrepare=false]]]]
> java.lang.AssertionError: null
> at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:918)
> at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CachedDeploymentInfo.<init>(GridCacheDeploymentManager.java:889)
> at org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.p2pContext(GridCacheDeploymentManager.java:422)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1576)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:584)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
> at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
> at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1565)
> at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1189)
> at org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:130)
> at org.apache.ignite.internal.managers.communication.GridIoManager$8.run(GridIoManager.java:1092)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
> There is no fair reproducer for now, but it seems that we should prevent such situation in general like following:
> 1) check the correctness of the message before it will be sent - inside of GridCacheDeploymentManager#prepare. If we have the corresponding class loader on local node, we can try to fix message and replace wrong class loader with local one.
> 2) log suspicious deployments which we receive from GridDeploymentManager#deploy - maybe we have obsolete deployments in caches. 
> 3) possibly we can remove this assertion, we should have this class on sender node and use it as class loader id, and if we don't, we will receive exception on finishUnmarshall (Failed to peer load class) and try to process this situation with GridCacheIoManager#processFailedMessage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)