You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Semen Boikov (JIRA)" <ji...@apache.org> on 2016/06/02 09:23:59 UTC

[jira] [Commented] (IGNITE-3212) Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test

    [ https://issues.apache.org/jira/browse/IGNITE-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312014#comment-15312014 ] 

Semen Boikov commented on IGNITE-3212:
--------------------------------------

Observed StackOverflowError in logs:
{noformat}
	at java.lang.StringCoding.deref(StringCoding.java:63)
	at java.lang.StringCoding.decode(StringCoding.java:179)
	at java.lang.String.<init>(String.java:416)
	at java.lang.String.<init>(String.java:481)
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
	at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:117)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:2360)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2137)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2031)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:1967)
	at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:1933)
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1285)
	at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1354)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:707)
	at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:856)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:170)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:173)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture.prepare(GridCacheTxRecoveryFuture.java:173)
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
...
...
	at org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.commitIfPrepared(IgniteTxManager.java:1892)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.onNodeLeft(GridCacheTxRecoveryFuture.java:524)
	at org.apache.ignite.internal.processors.cache.distributed.GridCacheTxRecoveryFuture$MiniFuture.access$200(GridCacheTxRecoveryFuture.java:475)
{noformat}


> Servers get stuck with the warning "Failed to wait for initial partition map exchange" during falover test
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-3212
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3212
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 1.6
>            Reporter: Ksenia Rybakova
>            Assignee: Semen Boikov
>             Fix For: 1.7
>
>
> Servers being restarted during failover test get stuck after some time with the warning "Failed to wait for initial partition map exchange". 
> {noformat}
> [08:44:41,303][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=db557f04-43b7-4e28-ae0d-d4dcf4139c89, addrs=
> [10.20.0.222, 127.0.0.1], sockAddrs=[fosters-222/10.20.0.222:47503, /10.20.0.222:47503, /127.0.0.1:47503], discPort=47503, order=44, intOrder=32, lastExchangeTime=1464
> 363880917, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:44:41,304][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] Topology snapshot [ver=44, servers=19, clients=1, CPUs=64, heap=160.0GB]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=6fae61a7-c1c1-40e5-8ad0-8bf5d6c86eb7, addrs=
> [10.20.0.223, 127.0.0.1], sockAddrs=[fosters-223/10.20.0.223:47503, /10.20.0.223:47503, /127.0.0.1:47503], discPort=47503, order=45, intOrder=33, lastExchangeTime=1464
> 363910999, loc=false, ver=1.6.0#20160525-sha1:48321a40, isClient=false]
> [08:45:11,455][INFO ][disco-event-worker-#80%null%][GridDiscoveryManager] Topology snapshot [ver=45, servers=20, clients=1, CPUs=64, heap=170.0GB]
> [08:45:19,942][INFO ][ignite-update-notifier-timer][GridUpdateNotifier] Update status is not available.
> [08:46:20,370][WARN ][main][GridCachePartitionExchangeManager] Failed to wait for initial partition map exchange. Possible reasons are:
>   ^-- Transactions in deadlock.
>   ^-- Long running transactions (ignore if this is the case).
>   ^-- Unreleased explicit locks.
> [08:48:30,375][WARN ][main][GridCachePartitionExchangeManager] Still waiting for initial partition map exchange ...
> {noformat}
> "Failed to wait for partition release future" warnings are on other nodes.
> {noformat}
> [08:09:45,822][WARN ][exchange-worker-#82%null%][GridDhtPartitionsExchangeFuture] Failed to wait for partition release future [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], node=cab5d0e0-7365-4774-8f99-d9f131c5d896]. Dumping pending objects that might be the cause:
> [08:09:45,822][WARN ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Ready affinity version: AffinityTopologyVersion [topVer=28, minorTopVer=1]
> [08:09:45,826][WARN ][exchange-worker-#82%null%][GridCachePartitionExchangeManager] Last exchange future: GridDhtPartitionsExchangeFuture ...
> {noformat}
> Load config:
> - 1 client, 20 servers (5 servers per 1 host)
> - warmup 60
> - duration 66h
> - preload 5M
> - key range 10M
> - operations: PUT PUT_ALL GET GET_ALL INVOKE INVOKE_ALL REMOVE REMOVE_ALL PUT_IF_ABSENT REPLACE
> - backups count 3
> - 3 servers restart every 15 min with 30 sec step, pause between stop and start 5min



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)