You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexey Goncharuk (JIRA)" <ji...@apache.org> on 2018/10/04 09:21:00 UTC

[jira] [Commented] (IGNITE-9790) Assertion error on full messages merge after coordinator failover

    [ https://issues.apache.org/jira/browse/IGNITE-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16637971#comment-16637971 ] 

Alexey Goncharuk commented on IGNITE-9790:
------------------------------------------

I observed the following assertion on TC in Hadoop suite:
{code}
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] java.lang.AssertionError
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap.compareTo(GridDhtPartitionFullMap.java:258)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsFullMessage.merge(GridDhtPartitionsFullMessage.java:817)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.addOrMergeDelayedFullMessage(GridDhtPartitionsExchangeFuture.java:4357)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:384)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:379)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3094)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3073)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[14:10:36]W:		 [org.apache.ignite:ignite-hadoop] 	at java.lang.Thread.run(Thread.java:748)
{code}

The issue is caused by the IGNITE-9492 ticket. The code assumes that the full message will always be sent by the same node. This is not true, however, in cause of coordinator failover.

We should check if the message is received from a new coordinator and replace delayed message (and ignore the ones from the old coordinator).

> Assertion error on full messages merge after coordinator failover
> -----------------------------------------------------------------
>
>                 Key: IGNITE-9790
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9790
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Goncharuk
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)