You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vyacheslav Koptilin (Jira)" <ji...@apache.org> on 2022/08/17 09:39:00 UTC

[jira] [Updated] (IGNITE-17542) Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507

     [ https://issues.apache.org/jira/browse/IGNITE-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vyacheslav Koptilin updated IGNITE-17542:
-----------------------------------------
    Description: 
The test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flay due to IGNITE-17507.
The root cause of the issue that _CacheAffinityChangeMessage_ mutates the message outside the _disco-notifier_ thread, and this fact may lead to the following exception:

{noformat}
[2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi] TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability.
  org.apache.ignite.IgniteException: Failed to marshal mutable discovery message: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) [classes/:?]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989) [classes/:?]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) [classes/:?]
  Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize object: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:102) ~[classes/:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
    at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
    at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
    ... 8 more
  Caused by: java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
    at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
    at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
    at org.apache.ignite.internal.util.IgniteUtils.writeMap(IgniteUtils.java:5706) ~[classes/:?]
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap.writeExternal(GridDhtPartitionFullMap.java:188) ~[classes/:?]
    at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
    at java.util.HashMap.internalWriteEntries(HashMap.java:1840) ~[?:?]
    at java.util.HashMap.writeObject(HashMap.java:1411) ~[?:?]
    at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1145) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:97) ~[classes/:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
    at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
    at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
    ... 8 more
{noformat}

In my understanding, the discovery message should not be changed in any way outside the  _disco-notifier_ thread. At least, _CacheAffinityChangeMessage.partitionsMessage()_ method should always return a deep copy of _GridDhtPartitionsFullMessage_ in order to overcome the issue.

  was:
The test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flay due to IGNITE-17507.
The root cause of the issue that _CacheAffinityChangeMessage_ mutates the message outside the _disco-notifier_ thread, and this fact may lead to the following exception:

{noformat}
[2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi] TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability.
  org.apache.ignite.IgniteException: Failed to marshal mutable discovery message: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058) ~[classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) [classes/:?]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989) [classes/:?]
    at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) [classes/:?]
  Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize object: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:102) ~[classes/:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
    at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
    at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
    ... 8 more
  Caused by: java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
    at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
    at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
    at org.apache.ignite.internal.util.IgniteUtils.writeMap(IgniteUtils.java:5706) ~[classes/:?]
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap.writeExternal(GridDhtPartitionFullMap.java:188) ~[classes/:?]
    at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
    at java.util.HashMap.internalWriteEntries(HashMap.java:1840) ~[?:?]
    at java.util.HashMap.writeObject(HashMap.java:1411) ~[?:?]
    at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1145) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
    at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
    at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:97) ~[classes/:?]
    at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
    at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
    at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
    at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
    ... 8 more
{noformat}

In my understanding, the discovery message should not be changed in any way outside the  _disco-notifier_ thread. At least, _CacheAffinityChangeMessage.partitionsMessage()_ method should always return a deep copy of _GridDhtPartitionsFullMessage_.


> Test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flaky after IGNITE-17507
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-17542
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17542
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.14
>            Reporter: Vyacheslav Koptilin
>            Assignee: Vyacheslav Koptilin
>            Priority: Major
>             Fix For: 2.14
>
>
> The test CacheLateAffinityAssignmentTest.testAffinitySimpleNoCacheOnCoordinator2 became flay due to IGNITE-17507.
> The root cause of the issue that _CacheAffinityChangeMessage_ mutates the message outside the _disco-notifier_ thread, and this fact may lead to the following exception:
> {noformat}
> [2022-08-16T21:10:32,133][ERROR][tcp-disco-msg-worker-[0448095b 127.0.0.1:47502]-#5308%distributed.CacheLateAffinityAssignmentTest3%-#98199%distributed.CacheLateAffinityAssignmentTest3%][TestTcpDiscoverySpi] TcpDiscoverSpi's message worker thread failed abnormally. Stopping the node in order to prevent cluster wide instability.
>   org.apache.ignite.IgniteException: Failed to marshal mutable discovery message: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6423) ~[classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:6243) ~[classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:3260) ~[classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2918) ~[classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:8058) ~[classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:3089) [classes/:?]
>     at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) [classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7989) [classes/:?]
>     at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58) [classes/:?]
>   Caused by: org.apache.ignite.IgniteCheckedException: Failed to serialize object: CacheAffinityChangeMessage [id=ea31ffaa281-0286b465-6baf-4ad8-9e3b-3f8cb755d1dd, topVer=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], partsMsg=GridDhtPartitionsFullMessage [parts=HashMap {-2100569601=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=111, size=100], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=116, size=100]}, 1251687457=GridDhtPartitionFullMap {f57cbb85-44ba-40d1-814e-937f96c00003=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=6, minorTopVer=0], updateSeq=1035, size=1024], 0448095b-02d8-470c-ab90-6a5bcf800002=GridDhtPartitionMap [moving=0, top=AffinityTopologyVersion [topVer=4, minorTopVer=0], updateSeq=3, size=0]}}, partCntrs=IgniteDhtPartitionCountersMap [], partCntrs2=null, partHistSuppliers=IgniteDhtPartitionHistorySuppliersMap [], partsToReload=IgniteDhtPartitionsToReloadMap [], topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], errs=null, resTopVer=null, flags=0, partCnt=2, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=null, nodeId=f9f9faf0, evt=NODE_LEFT], lastVer=GridCacheVersion [topVer=0, order=1660673425660, nodeOrder=0, dataCenterId=0], super=GridCacheMessage [msgId=-1, depInfo=null, lastAffChangedTopVer=null, err=null, skipPrepare=false]]], exchangeNeeded=false, stopProc=false]
>     at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:102) ~[classes/:?]
>     at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
>     at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
>     at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
>     ... 8 more
>   Caused by: java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextNode(HashMap.java:1493) ~[?:?]
>     at java.util.HashMap$EntryIterator.next(HashMap.java:1526) ~[?:?]
>     at java.util.HashMap$EntryIterator.next(HashMap.java:1524) ~[?:?]
>     at org.apache.ignite.internal.util.IgniteUtils.writeMap(IgniteUtils.java:5706) ~[classes/:?]
>     at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap.writeExternal(GridDhtPartitionFullMap.java:188) ~[classes/:?]
>     at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1460) ~[?:?]
>     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
>     at java.util.HashMap.internalWriteEntries(HashMap.java:1840) ~[?:?]
>     at java.util.HashMap.writeObject(HashMap.java:1411) ~[?:?]
>     at jdk.internal.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) ~[?:?]
>     at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
>     at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
>     at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1145) ~[?:?]
>     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497) ~[?:?]
>     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
>     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
>     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
>     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
>     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
>     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
>     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
>     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553) ~[?:?]
>     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510) ~[?:?]
>     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179) ~[?:?]
>     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349) ~[?:?]
>     at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:97) ~[classes/:?]
>     at org.apache.ignite.marshaller.jdk.JdkMarshaller.marshal0(JdkMarshaller.java:109) ~[classes/:?]
>     at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.marshal(AbstractNodeNameAwareMarshaller.java:56) ~[classes/:?]
>     at org.apache.ignite.internal.util.IgniteUtils.marshal(IgniteUtils.java:10827) [classes/:?]
>     at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:6420) ~[classes/:?]
>     ... 8 more
> {noformat}
> In my understanding, the discovery message should not be changed in any way outside the  _disco-notifier_ thread. At least, _CacheAffinityChangeMessage.partitionsMessage()_ method should always return a deep copy of _GridDhtPartitionsFullMessage_ in order to overcome the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)