You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Stanislav Lukyanov <st...@gmail.com> on 2018/06/11 16:27:01 UTC

RE: Cache operations hanging for a minute when one of server nodesgoesdown

When a node joins the cluster needs to perform partition map exchange process. If this process takes too long, the cluster may become unresponsive.
Looks like this is what happened in your case. You can check how long an exchange took by looking for “Started exchange” and “Finished exchange” in the logs – I assume it’s around 20 seconds.

Debugging hanged partition map exchange issues may be pretty tricky.
My best guess so far is that the reduced timeouts you’ve set resulted in failed network IO (e.g. instead of waiting for a message for 10s and getting it on the first try, you retry every 3s until a fast enough delivery happens – which might be the 10th or 20th attempt).
Try changing timeouts back and see how long your exchanges take on a node join. Perhaps some value will be low enough to detect node failures and high enough to allow regular operations to pass. 
If that doesn’t help, please share the full logs from all nodes.

Thanks,
Stan 

From: Olexandr K
Sent: 11 июня 2018 г. 14:24
To: user@ignite.apache.org
Subject: Re: Cache operations hanging for a minute when one of server nodesgoesdown

Hi Stan,

I tried to decrease network/failure timeouts and it worked fine when node stopped.
Unfortunately I got lot's of hanged calls when it started again.
At that time all cache calls got stuck for 25-30 seconds.
Is it expected? I thought rebalancing should occur in background and node should join the cluster when it is 100% ready, no?
See some log extracts below.

        <property name="networkSendRetryCount" value="1" /> <!-- default: 3 -->
        <property name="networkSendRetryDelay" value="500" /> <!-- default: 1000 -->
        <property name="networkTimeout" value="1500" /> <!-- default: 5000 -->
        <property name="failureDetectionTimeout" value="3000" /> <!-- default: 10000 -->
        <property name="clientFailureDetectionTimeout" value="10000" /> <!-- default: 30000 -->


-- CACHE CALL STARTED: cache.put()

83837380032 2018-06-11 13:51:54.777 [https-jsse-nio-8080-exec-4] INFO  com.xxx.lk.backend.cache.impl.RefreshTokenCache - Store: d289a0a3-bca6-49a4-ae9d-9568517d656e

-- MANY WARNINGS BEFORE CALL COMPLETED

2018-06-11 13:51:55.058 [grid-timeout-worker-#4] DEBUG org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - Timeout has occurred [obj=CancelableTask [id=fe9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, endTime=1528714315053, period=3000, cancel=false, task=org.apache.ignite.internal.processors.query.GridQueryProcessor$2@2649758c], process=true]
2018-06-11 13:51:55.355 [grid-timeout-worker-#4] DEBUG org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - Timeout has occurred [obj=CancelableTask [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, endTime=1528714315351, period=2000, cancel=false, task=org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true]
2018-06-11 13:51:56.339 [nio-acceptor-#5] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]

2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], node=ba482197-64cc-4d84-81f7-2b58f0c66a0c]. Dumping pending objects that might be the cause: 
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Ready affinity version: AffinityTopologyVersion [topVer=5, minorTopVer=0]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Last exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714314058], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714314058], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=538678914], init=true, lastVer=null, partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=1528714314058, centralizedAff=false, changeGlobalStateE=null, done=false, state=CLIENT, evtLatch=0, remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, 34dc1ff0-45ec-4fee-995b-84dde8a451cb], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1716673894]]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager - First 10 pending exchange futures [total=0]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Last 10 exchange futures (total: 3):
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_LEFT, evtNode=TcpDiscoveryNode [id=c7e39052-0882-42f3-91aa-6a438c2ab745, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=true]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[v-hp-lk-jbn02.9801.1678.ecs.hp.com/30.251.106.11:0, /10.2.0.250:0, /127.0.0.1:0], discPort=0, order=4, intOrder=0, lastExchangeTime=1528714098847, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=true]
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending transactions:
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending explicit locks:
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending cache futures:
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending atomic cache futures:
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending data streamer futures:
 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN  org.apache.ignite.internal.diagnostic - Pending transaction deadlock detection futures:
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Encrypted data [status=OK, handshakeStaus=NOT_HANDSHAKING, ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=11210, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=853 lim=853 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=95956, bytesRcvd=11210, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714313367, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]]
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0, bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050, bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING, ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0, bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050, bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]]
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped raw data [status=OK, handshakeStatus=NOT_HANDSHAKING, ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]]
 2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING, ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]]]
 2018-06-11 13:51:57.089 [sys-#25] INFO  org.apache.ignite.internal.diagnostic - Exchange future waiting for coordinator response [crd=34dc1ff0-45ec-4fee-995b-84dde8a451cb, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0]]
Remote node information:
General node info [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, client=false, discoTopVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], time=13:51:56.992]
Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=5, minorTopVer=0]]
Last initialized exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714313966], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=1994546166], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1528714313966, centralizedAff=false, changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]]
Communication SPI statistics [rmtNode=ba482197-64cc-4d84-81f7-2b58f0c66a0c]
Communication SPI recovery descriptors: 
    [key=ConnectionKey [nodeId=ba482197-64cc-4d84-81f7-2b58f0c66a0c, idx=0, connCnt=0], msgsSent=4, msgsAckedByRmt=0, msgsRcvd=453, lastAcked=448, reserveCnt=1, descIdHash=1249269712]
Communication SPI clients: 
    [node=ba482197-64cc-4d84-81f7-2b58f0c66a0c, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=2, bytesRcvd=98649, bytesSent=0, bytesRcvd0=933, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null, finished=false, hashCode=989360520, interrupted=false, runner=grid-nio-worker-tcp-comm-2-#27]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=453, sentCnt=4, reserved=true, lastAck=448, nodeLeft=false, node=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, /30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4, lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=453, sentCnt=4, reserved=true, lastAck=448, nodeLeft=false, node=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c, addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, /30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4, lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.252:47100, rmtAddr=/10.2.0.250:49581, createTime=1528714107367, closeTime=0, bytesSent=12857, bytesRcvd=98649, bytesSent0=0, bytesRcvd0=933, sndSchedTime=1528714107367, lastSndTime=1528714313279, lastRcvTime=1528714316992, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@5f146306, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=true]], super=GridAbstractCommunicationClient [lastUsed=1528714107415, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=2, keysCnt=1, bytesRcvd=98649, bytesRcvd0=933, bytesSent=0, bytesSent0=0]
    Connection info [in=true, rmtAddr=/10.2.0.250:49581, locAddr=/10.2.0.252:47100, msgsSent=4, msgsAckedByRmt=0, descIdHash=1249269712, unackedMsgs=[GridNearSingleGetResponse, GridNearSingleGetResponse, GridDhtPartitionsFullMessage, GridDhtPartitionsFullMessage], msgsRcvd=453, lastAcked=448, descIdHash=1249269712, bytesRcvd=98649, bytesRcvd0=933, bytesSent=12857, bytesSent0=0, opQueueSize=0]
Exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714313966], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6, nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313497, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=1994546166], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1528714313966, centralizedAff=false, changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]]
Local communication statistics:
Communication SPI statistics [rmtNode=34dc1ff0-45ec-4fee-995b-84dde8a451cb]
Communication SPI recovery descriptors: 
    [key=ConnectionKey [nodeId=34dc1ff0-45ec-4fee-995b-84dde8a451cb, idx=0, connCnt=-1], msgsSent=453, msgsAckedByRmt=448, msgsRcvd=5, lastAcked=0, reserveCnt=1, descIdHash=105307846]
Communication SPI clients: 
    [node=34dc1ff0-45ec-4fee-995b-84dde8a451cb, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0, bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null, finished=false, hashCode=1835849342, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=5, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=5, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500, /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714105823, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.2.0.250:49581, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719, bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495, lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5, directMode=true], GridConnectionBytesVerifyFilter, SSL filter], accepted=false]], super=GridAbstractCommunicationClient [lastUsed=1528714107495, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=1, keysCnt=1, bytesRcvd=20719, bytesRcvd0=9509, bytesSent=0, bytesSent0=0]
    Connection info [in=false, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100, locAddr=/10.2.0.250:49581, msgsSent=453, msgsAckedByRmt=448, descIdHash=105307846, unackedMsgs=[GridNearAtomicFullUpdateRequest, GridNearAtomicFullUpdateRequest, GridNearAtomicFullUpdateRequest, GridDhtPartitionsSingleMessage, IgniteDiagnosticMessage], msgsRcvd=5, lastAcked=0, descIdHash=105307846, bytesRcvd=20719, bytesRcvd0=9509, bytesSent=96889, bytesSent0=933, opQueueSize=0]

2018-06-11 13:51:57.370 [grid-timeout-worker-#4] DEBUG org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor - Timeout has occurred [obj=CancelableTask [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8, endTime=1528714317355, period=2000, cancel=false, task=org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true]
-- CALL COMPLETED

83837380032 2018-06-11 13:52:21.903 [https-jsse-nio-8080-exec-4] DEBUG com.xxx.lk.backend.cache.impl.RefreshTokenCache - Cache operation completed

On Mon, Jun 11, 2018 at 12:27 PM, Stanislav Lukyanov <st...@gmail.com> wrote:
Hi,
 
Try to configure the existing timeouts:
- failureDetectionTimeout
- clientFailureDetectionTimeout
- networkTimeout
 
Not sure if the Future you mention really needs timeout. If there was a timeout, it would be either failureDetectionTimeout or networkTimeout anyway.
Currently the calling thread has to wait for the failed node to be kicked out of the cluster. The time that the cluster needs to kick out a node depends (primarily) on the failureDetectionTimeout, although isn’t strictly equal to it.
 
The problem I can see here is that the putAsync is not actually async since it performs substantial work in the calling thread. I’ll give it another look and file a bug.
For now, as a workaround (aside from configuring the timeouts) I’d suggest to use an ExecutorService to call put()/putAsync(), getting a cancelable Future from the start.
 
Thanks,
Stan
 
From: Olexandr K
Sent: 11 июня 2018 г. 2:44
To: user@ignite.apache.org
Subject: Cache operations hanging for a minute when one of server nodes goesdown
 
Hi Igniters,
 
I'm testing our system for availability.
It uses Ignite as key/value persistent cache.
 
Here is my test:
1) start 2 server and 2 client nodes
2) run heavy load on client nodes (some application logic which cause cache calls)
3) stop 1 server node
 
Here I expect all in-progress cache operations targeted to server 1 node to fail fast.
What I don't want is to hang all my processing threads for significant time. 
Unfortunately it works exactly that way: I'm constantly getting my threads blocked for 20-80 seconds. 
Finally putAsync() completes successfully but I'd prefer cache operation to fail fast. I don't want to hang all processing threads for a minute because of cache.
 
It works the same for put() and putAsync() calls.
 
As I see in the code, it can be fixed by calling future.get(timeout) instead of future.get() in TcpCommunicationSpi.
Timeout should be configurable.
 
TcpCommunicationSpi (line: 2799)
  private GridCommunicationClient reserveClient(ClusterNode node, int connIdx) {
    ...
    client = fut.get();
 
Does it make sense from your point of view?
 
Here is my thread  dump:
 
threads=[
  {
    threadName=https-jsse-nio-8080-exec-20,
    threadId=102,
    blockedTime=-1,
    blockedCount=0,
    waitedTime=-1,
    waitedCount=5,
    lockName=null,
    lockOwnerId=-1,
    lockOwnerName=null,
    inNative=false,
    suspended=false,
    threadState=WAITING,
    stackTrace=[
      {
        methodName=park,
        fileName=Unsafe.java,
        lineNumber=-2,
        className=sun.misc.Unsafe,
        nativeMethod=true
      },
      {
        methodName=park,
        fileName=LockSupport.java,
        lineNumber=304,
        className=java.util.concurrent.locks.LockSupport,
        nativeMethod=false
      },
      {
        methodName=get0,
        fileName=GridFutureAdapter.java,
        lineNumber=177,
        className=org.apache.ignite.internal.util.future.GridFutureAdapter,
        nativeMethod=false
      },
      {
        methodName=get,
        fileName=GridFutureAdapter.java,
        lineNumber=140,
        className=org.apache.ignite.internal.util.future.GridFutureAdapter,
        nativeMethod=false
      },
      {
        methodName=reserveClient,
        fileName=TcpCommunicationSpi.java,
        lineNumber=2799,
        className=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi,
        nativeMethod=false
      },
....
      {
        methodName=putAsync,
        fileName=IgniteCacheProxyImpl.java,
        lineNumber=1035,
        className=org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl,
        nativeMethod=false
      },
      {
        methodName=putAsync,
        fileName=GatewayProtectedCacheProxy.java,
        lineNumber=900,
        className=org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy,
        nativeMethod=false
      },
 
Sample cache config:
 
<bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="RefreshToken"/>
                    <property name="dataRegionName" value="auth_durable_region"/>
                    <property name="atomicityMode" value="ATOMIC"/>
                    <property name="writeSynchronizationMode" value="FULL_ASYNC"/>
                    <property name="cacheMode" value="PARTITIONED"/>
                    <property name="backups" value="1"/>
                    <property name="eagerTtl" value="true"/> 
                </bean>
 
Ignite version: 2.4.0
OS: Windows Server 2012 R2
 
BR, Oleksandr
 



Re: Cache operations hanging for a minute when one of server nodesgoesdown

Posted by Olexandr K <ol...@gmail.com>.
Hi Stan,

I spent half of a day but was not able to find such balanced configuration.

I followed your second advice and everything looks good now

"I’d suggest to use an ExecutorService to call put()/putAsync(), getting a
cancelable Future from the start. "

I'm doing all ignite calls via dedicated thread pool and controlling max
call time via future.
Calls still hangs for 5-60 seconds after server nodes up/down but this
happens in cache-pool and is not affecting whole system.

I'm just handling this as cache misses on application side

Thanks


On Mon, Jun 11, 2018 at 7:27 PM, Stanislav Lukyanov <st...@gmail.com>
wrote:

> When a node joins the cluster needs to perform partition map exchange
> process. If this process takes too long, the cluster may become
> unresponsive.
>
> Looks like this is what happened in your case. You can check how long an
> exchange took by looking for “Started exchange” and “Finished exchange” in
> the logs – I assume it’s around 20 seconds.
>
>
>
> Debugging hanged partition map exchange issues may be pretty tricky.
>
> My best guess so far is that the reduced timeouts you’ve set resulted in
> failed network IO (e.g. instead of waiting for a message for 10s and
> getting it on the first try, you retry every 3s until a fast enough
> delivery happens – which might be the 10th or 20th attempt).
>
> Try changing timeouts back and see how long your exchanges take on a node
> join. Perhaps some value will be low enough to detect node failures and
> high enough to allow regular operations to pass.
>
> If that doesn’t help, please share the full logs from all nodes.
>
>
>
> Thanks,
>
> Stan
>
>
>
> *From: *Olexandr K <ol...@gmail.com>
> *Sent: *11 июня 2018 г. 14:24
> *To: *user@ignite.apache.org
> *Subject: *Re: Cache operations hanging for a minute when one of server
> nodesgoesdown
>
>
>
> Hi Stan,
>
>
>
> I tried to decrease network/failure timeouts and it worked fine when node
> stopped.
>
> Unfortunately I got lot's of hanged calls when it started again.
>
> At that time all cache calls got stuck for 25-30 seconds.
>
> Is it expected? I thought rebalancing should occur in background and node
> should join the cluster when it is 100% ready, no?
>
> See some log extracts below.
>
>
>
>         <property name="networkSendRetryCount" value="1" /> <!-- default:
> 3 -->
>         <property name="networkSendRetryDelay" value="500" /> <!--
> default: 1000 -->
>         <property name="networkTimeout" value="1500" /> <!-- default: 5000
> -->
>         <property name="failureDetectionTimeout" value="3000" /> <!--
> default: 10000 -->
>         <property name="clientFailureDetectionTimeout" value="10000" />
> <!-- default: 30000 -->
>
>
>
>
>
> -- CACHE CALL STARTED: cache.put()
>
>
>
> 83837380032 2018-06-11 13:51:54.777 [https-jsse-nio-8080-exec-4] INFO
> com.xxx.lk.backend.cache.impl.RefreshTokenCache - Store:
> d289a0a3-bca6-49a4-ae9d-9568517d656e
>
>
>
> -- MANY WARNINGS BEFORE CALL COMPLETED
>
>
> 2018-06-11 13:51:55.058 [grid-timeout-worker-#4] DEBUG
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor -
> Timeout has occurred [obj=CancelableTask [id=fe9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8,
> endTime=1528714315053, period=3000, cancel=false, task=org.apache.ignite.
> internal.processors.query.GridQueryProcessor$2@2649758c], process=true]
> 2018-06-11 13:51:55.355 [grid-timeout-worker-#4] DEBUG
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor -
> Timeout has occurred [obj=CancelableTask [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8,
> endTime=1528714315351, period=2000, cancel=false, task=org.apache.ignite.
> internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true]
> 2018-06-11 13:51:56.339 [nio-acceptor-#5] DEBUG org.apache.ignite.spi.
> communication.tcp.TcpCommunicationSpi - Balancing data [min0=0, minIdx=0,
> max0=-1, maxIdx=-1]
>
> 2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Failed to wait for partition map
> exchange [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> node=ba482197-64cc-4d84-81f7-2b58f0c66a0c]. Dumping pending objects that
> might be the cause:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Ready affinity version:
> AffinityTopologyVersion [topVer=5, minorTopVer=0]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Last exchange future:
> GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent
> [evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f,
> addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.
> xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500],
> discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313636, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714314058], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313636, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=ba482197, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313636, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714314058], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true,
> initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
> hash=538678914], init=true, lastVer=null, partReleaseFut=null,
> exchActions=null, affChangeMsg=null, initTs=1528714314058,
> centralizedAff=false, changeGlobalStateE=null, done=false, state=CLIENT,
> evtLatch=0, remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f,
> 34dc1ff0-45ec-4fee-995b-84dde8a451cb], super=GridFutureAdapter
> [ignoreInterrupts=false, state=INIT, res=null, hash=1716673894]]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager
> - First 10 pending exchange futures [total=0]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Last 10 exchange futures (total:
> 3):
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f,
> addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.
> xxxgroup.tek.loc/10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500],
> discPort=47500, order=6, intOrder=5, lastExchangeTime=1528714313636,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture
> [topVer=AffinityTopologyVersion [topVer=5, minorTopVer=0], evt=NODE_LEFT,
> evtNode=TcpDiscoveryNode [id=c7e39052-0882-42f3-91aa-6a438c2ab745,
> addrs=[10.2.0.163, 127.0.0.1, 30.251.106.199], sockAddrs=[/
> 10.2.0.163:47500, /127.0.0.1:47500, /30.251.106.199:47500],
> discPort=47500, order=1, intOrder=1, lastExchangeTime=1528714105823,
> loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=true]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture
> [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0],
> evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c,
> addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[
> v-hp-lk-jbn02.9801.1678.ecs.hp.com/30.251.106.11:0, /10.2.0.250:0, /
> 127.0.0.1:0], discPort=0, order=4, intOrder=0, lastExchangeTime=1528714098847,
> loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=true]
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending transactions:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending explicit locks:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending cache futures:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending atomic cache futures:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending data streamer futures:
>  2018-06-11 13:51:57.074 [exchange-worker-#18] WARN
> org.apache.ignite.internal.diagnostic - Pending transaction deadlock
> detection futures:
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Encrypted
> data [status=OK, handshakeStaus=NOT_HANDSHAKING,
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=11210, bytesSent=0,
> bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=853
> lim=853 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=95956, bytesRcvd=11210,
> bytesSent0=0, bytesRcvd0=0, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714313367, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]]
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping
> received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0,
> bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050,
> bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped
> raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING,
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=17050, bytesSent=0,
> bytesRcvd0=5840, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=5840 lim=5840
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=17050,
> bytesSent0=933, bytesRcvd0=5840, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]]
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapping
> received data: GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0,
> bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719,
> bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped
> raw data [status=OK, handshakeStatus=NOT_HANDSHAKING,
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0,
> bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719,
> bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]]
>  2018-06-11 13:51:57.074 [grid-nio-worker-tcp-comm-1-#7] DEBUG
> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Unwrapped
> raw data [status=BUFFER_UNDERFLOW, handshakeStatus=NOT_HANDSHAKING,
> ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0,
> bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=3669 lim=3669
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=4, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=4,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719,
> bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]]]
>  2018-06-11 13:51:57.089 [sys-#25] INFO  org.apache.ignite.internal.diagnostic
> - Exchange future waiting for coordinator response
> [crd=34dc1ff0-45ec-4fee-995b-84dde8a451cb, topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0]]
> Remote node information:
> General node info [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb, client=false,
> discoTopVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> time=13:51:56.992]
> Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=5,
> minorTopVer=0]]
> Last initialized exchange future: GridDhtPartitionsExchangeFuture
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714313966], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500],
> discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726,
> loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true,
> initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
> hash=1994546166], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[ExplicitLockReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]],
> TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6,
> minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null,
> affChangeMsg=null, initTs=1528714313966, centralizedAff=false,
> changeGlobalStateE=null, done=false, state=CRD, evtLatch=0,
> remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter
> [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]]
> Communication SPI statistics [rmtNode=ba482197-64cc-4d84-
> 81f7-2b58f0c66a0c]
> Communication SPI recovery descriptors:
>     [key=ConnectionKey [nodeId=ba482197-64cc-4d84-81f7-2b58f0c66a0c,
> idx=0, connCnt=0], msgsSent=4, msgsAckedByRmt=0, msgsRcvd=453,
> lastAcked=448, reserveCnt=1, descIdHash=1249269712]
> Communication SPI clients:
>     [node=ba482197-64cc-4d84-81f7-2b58f0c66a0c, client=GridTcpNioCommunicationClient
> [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=2, bytesRcvd=98649, bytesSent=0,
> bytesRcvd0=933, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-2, igniteInstanceName=null,
> finished=false, hashCode=989360520, interrupted=false,
> runner=grid-nio-worker-tcp-comm-2-#27]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0,
> rcvCnt=453, sentCnt=4, reserved=true, lastAck=448, nodeLeft=false,
> node=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c,
> addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, /
> 30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4,
> lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270,
> isClient=true], connected=true, connectCnt=0, queueLimit=4096,
> reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor
> [acked=0, resendCnt=0, rcvCnt=453, sentCnt=4, reserved=true, lastAck=448,
> nodeLeft=false, node=TcpDiscoveryNode [id=ba482197-64cc-4d84-81f7-2b58f0c66a0c,
> addrs=[10.2.0.250, 127.0.0.1, 30.251.106.11], sockAddrs=[/10.2.0.250:0, /
> 30.251.106.11:0, /127.0.0.1:0], discPort=0, order=4, intOrder=4,
> lastExchangeTime=1528714105712, loc=false, ver=2.4.0#20180305-sha1:aa342270,
> isClient=true], connected=true, connectCnt=0, queueLimit=4096,
> reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/
> 10.2.0.252:47100, rmtAddr=/10.2.0.250:49581, createTime=1528714107367,
> closeTime=0, bytesSent=12857, bytesRcvd=98649, bytesSent0=0,
> bytesRcvd0=933, sndSchedTime=1528714107367, lastSndTime=1528714313279,
> lastRcvTime=1528714316992, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@5f146306,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=true]], super=GridAbstractCommunicationClient
> [lastUsed=1528714107415, closed=false, connIdx=0]]]
> NIO sessions statistics:
> >> Selector info [idx=2, keysCnt=1, bytesRcvd=98649, bytesRcvd0=933,
> bytesSent=0, bytesSent0=0]
>     Connection info [in=true, rmtAddr=/10.2.0.250:49581, locAddr=/
> 10.2.0.252:47100, msgsSent=4, msgsAckedByRmt=0, descIdHash=1249269712,
> unackedMsgs=[GridNearSingleGetResponse, GridNearSingleGetResponse,
> GridDhtPartitionsFullMessage, GridDhtPartitionsFullMessage], msgsRcvd=453,
> lastAcked=448, descIdHash=1249269712, bytesRcvd=98649, bytesRcvd0=933,
> bytesSent=12857, bytesSent0=0, opQueueSize=0]
> Exchange future: GridDhtPartitionsExchangeFuture
> [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714313966], crd=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> v-hp-lk-dcn02.9801.1678.ecs.hp.com/30.251.106.90:47500, /10.2.0.252:47500],
> discPort=47500, order=2, intOrder=2, lastExchangeTime=1528714316726,
> loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0],
> discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=6,
> nodeId8=34dc1ff0, msg=Node joined: TcpDiscoveryNode
> [id=f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f, addrs=[10.2.0.163, 127.0.0.1,
> 30.251.106.199], sockAddrs=[v-hp-lk-dcn01.xxxgroup.tek.loc/10.2.0.163:
> 47500, /127.0.0.1:47500, /30.251.106.199:47500], discPort=47500, order=6,
> intOrder=5, lastExchangeTime=1528714313497, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED,
> tstamp=1528714313966], nodeId=f5fbdfd8, evt=NODE_JOINED], added=true,
> initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true,
> hash=1994546166], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[ExplicitLockReleaseFuture
> [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], futures=[]],
> TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=6,
> minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion
> [topVer=6, minorTopVer=0], futures=[]]]], exchActions=null,
> affChangeMsg=null, initTs=1528714313966, centralizedAff=false,
> changeGlobalStateE=null, done=false, state=CRD, evtLatch=0,
> remaining=[f5fbdfd8-8df1-4222-b4e5-2d12f42dd95f], super=GridFutureAdapter
> [ignoreInterrupts=false, state=INIT, res=null, hash=715729116]]
> Local communication statistics:
> Communication SPI statistics [rmtNode=34dc1ff0-45ec-4fee-
> 995b-84dde8a451cb]
> Communication SPI recovery descriptors:
>     [key=ConnectionKey [nodeId=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> idx=0, connCnt=-1], msgsSent=453, msgsAckedByRmt=448, msgsRcvd=5,
> lastAcked=0, reserveCnt=1, descIdHash=105307846]
> Communication SPI clients:
>     [node=34dc1ff0-45ec-4fee-995b-84dde8a451cb, client=GridTcpNioCommunicationClient
> [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker
> [super=AbstractNioClientWorker [idx=1, bytesRcvd=20719, bytesSent=0,
> bytesRcvd0=9509, bytesSent0=0, select=true, super=GridWorker
> [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=null,
> finished=false, hashCode=1835849342, interrupted=false,
> runner=grid-nio-worker-tcp-comm-1-#7]]], writeBuf=java.nio.DirectByteBuffer[pos=0
> lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768
> cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0,
> rcvCnt=5, sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> outRecovery=GridNioRecoveryDescriptor [acked=448, resendCnt=0, rcvCnt=5,
> sentCnt=453, reserved=true, lastAck=0, nodeLeft=false,
> node=TcpDiscoveryNode [id=34dc1ff0-45ec-4fee-995b-84dde8a451cb,
> addrs=[10.2.0.252, 127.0.0.1, 30.251.106.90], sockAddrs=[/127.0.0.1:47500,
> /10.2.0.252:47500, /30.251.106.90:47500], discPort=47500, order=2,
> intOrder=2, lastExchangeTime=1528714105823, loc=false,
> ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false,
> connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false],
> super=GridNioSessionImpl [locAddr=/10.2.0.250:49581,
> rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/10.2.0.252:47100,
> createTime=1528714107495, closeTime=0, bytesSent=96889, bytesRcvd=20719,
> bytesSent0=933, bytesRcvd0=9509, sndSchedTime=1528714107495,
> lastSndTime=1528714317074, lastRcvTime=1528714317074, readsPaused=false,
> filterChain=FilterChain[filters=[GridNioCodecFilter
> [parser=org.apache.ignite.internal.util.nio.GridDirectParser@51afeee5,
> directMode=true], GridConnectionBytesVerifyFilter, SSL filter],
> accepted=false]], super=GridAbstractCommunicationClient
> [lastUsed=1528714107495, closed=false, connIdx=0]]]
> NIO sessions statistics:
> >> Selector info [idx=1, keysCnt=1, bytesRcvd=20719, bytesRcvd0=9509,
> bytesSent=0, bytesSent0=0]
>     Connection info [in=false, rmtAddr=v-hp-lk-dcn02.xxxgroup.tek.loc/
> 10.2.0.252:47100, locAddr=/10.2.0.250:49581, msgsSent=453,
> msgsAckedByRmt=448, descIdHash=105307846, unackedMsgs=[
> GridNearAtomicFullUpdateRequest, GridNearAtomicFullUpdateRequest,
> GridNearAtomicFullUpdateRequest, GridDhtPartitionsSingleMessage,
> IgniteDiagnosticMessage], msgsRcvd=5, lastAcked=0, descIdHash=105307846,
> bytesRcvd=20719, bytesRcvd0=9509, bytesSent=96889, bytesSent0=933,
> opQueueSize=0]
>
> 2018-06-11 13:51:57.370 [grid-timeout-worker-#4] DEBUG
> org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor -
> Timeout has occurred [obj=CancelableTask [id=ee9e57ee361-ba3899f9-a10b-4498-98e6-ed3b65dfc3f8,
> endTime=1528714317355, period=2000, cancel=false, task=org.apache.ignite.
> internal.processors.query.h2.IgniteH2Indexing$13@110ac52b], process=true]
>
> -- CALL COMPLETED
>
>
> 83837380032 2018-06-11 13:52:21.903 [https-jsse-nio-8080-exec-4] DEBUG
> com.xxx.lk.backend.cache.impl.RefreshTokenCache - Cache operation
> completed
>
>
>
> On Mon, Jun 11, 2018 at 12:27 PM, Stanislav Lukyanov <
> stanlukyanov@gmail.com> wrote:
>
> Hi,
>
>
>
> Try to configure the existing timeouts:
>
> - failureDetectionTimeout
>
> - clientFailureDetectionTimeout
>
> - networkTimeout
>
>
>
> Not sure if the Future you mention really needs timeout. If there was a
> timeout, it would be either failureDetectionTimeout or networkTimeout
> anyway.
>
> Currently the calling thread has to wait for the failed node to be kicked
> out of the cluster. The time that the cluster needs to kick out a node
> depends (primarily) on the failureDetectionTimeout, although isn’t strictly
> equal to it.
>
>
>
> The problem I can see here is that the putAsync is not actually async
> since it performs substantial work in the calling thread. I’ll give it
> another look and file a bug.
>
> For now, as a workaround (aside from configuring the timeouts) I’d suggest
> to use an ExecutorService to call put()/putAsync(), getting a cancelable
> Future from the start.
>
>
>
> Thanks,
>
> Stan
>
>
>
> *From: *Olexandr K <ol...@gmail.com>
> *Sent: *11 июня 2018 г. 2:44
> *To: *user@ignite.apache.org
> *Subject: *Cache operations hanging for a minute when one of server nodes
> goesdown
>
>
>
> Hi Igniters,
>
>
>
> I'm testing our system for availability.
>
> It uses Ignite as key/value persistent cache.
>
>
>
> Here is my test:
>
> 1) start 2 server and 2 client nodes
>
> 2) run heavy load on client nodes (some application logic which cause
> cache calls)
>
> 3) stop 1 server node
>
>
>
> Here I expect all in-progress cache operations targeted to server 1 node
> to fail fast.
>
> What I don't want is to hang all my processing threads for significant
> time.
>
> Unfortunately it works exactly that way: I'm constantly getting my threads
> blocked for 20-80 seconds.
>
> Finally putAsync() completes successfully but I'd prefer cache operation
> to fail fast. I don't want to hang all processing threads for a minute
> because of cache.
>
>
>
> It works the same for put() and putAsync() calls.
>
>
>
> As I see in the code, it can be fixed by calling future.get(timeout)
> instead of future.get() in TcpCommunicationSpi.
>
> Timeout should be configurable.
>
>
>
> TcpCommunicationSpi (line: 2799)
>
>   private GridCommunicationClient reserveClient(ClusterNode node, int
> connIdx) {
>
>     ...
>
>     client = fut.get();
>
>
>
> Does it make sense from your point of view?
>
>
>
> Here is my thread  dump:
>
>
>
> threads=[
>   {
>     threadName=https-jsse-nio-8080-exec-20,
>     threadId=102,
>     blockedTime=-1,
>     blockedCount=0,
>     waitedTime=-1,
>     waitedCount=5,
>     lockName=null,
>     lockOwnerId=-1,
>     lockOwnerName=null,
>     inNative=false,
>     suspended=false,
>     threadState=WAITING,
>     stackTrace=[
>       {
>         methodName=park,
>         fileName=Unsafe.java,
>         lineNumber=-2,
>         className=sun.misc.Unsafe,
>         nativeMethod=true
>       },
>       {
>         methodName=park,
>         fileName=LockSupport.java,
>         lineNumber=304,
>         className=java.util.concurrent.locks.LockSupport,
>         nativeMethod=false
>       },
>       {
>         methodName=get0,
>         fileName=GridFutureAdapter.java,
>         lineNumber=177,
>         className=org.apache.ignite.internal.util.future.
> GridFutureAdapter,
>         nativeMethod=false
>       },
>       {
>         methodName=get,
>         fileName=GridFutureAdapter.java,
>         lineNumber=140,
>         className=org.apache.ignite.internal.util.future.
> GridFutureAdapter,
>         nativeMethod=false
>       },
>       {
>         methodName=reserveClient,
>         fileName=TcpCommunicationSpi.java,
>         lineNumber=2799,
>         className=org.apache.ignite.spi.communication.tcp.
> TcpCommunicationSpi,
>         nativeMethod=false
>       },
>
> ....
>
>       {
>         methodName=putAsync,
>         fileName=IgniteCacheProxyImpl.java,
>         lineNumber=1035,
>         className=org.apache.ignite.internal.processors.cache.
> IgniteCacheProxyImpl,
>         nativeMethod=false
>       },
>       {
>         methodName=putAsync,
>         fileName=GatewayProtectedCacheProxy.java,
>         lineNumber=900,
>         className=org.apache.ignite.internal.processors.cache.
> GatewayProtectedCacheProxy,
>         nativeMethod=false
>       },
>
>
>
> Sample cache config:
>
>
>
> <bean class="org.apache.ignite.configuration.CacheConfiguration">
>                     <property name="name" value="RefreshToken"/>
>                     <property name="dataRegionName"
> value="auth_durable_region"/>
>                     <property name="atomicityMode" value="ATOMIC"/>
>                     <property name="writeSynchronizationMode"
> value="FULL_ASYNC"/>
>                     <property name="cacheMode" value="PARTITIONED"/>
>                     <property name="backups" value="1"/>
>                     <property name="eagerTtl" value="true"/>
>                 </bean>
>
>
>
> Ignite version: 2.4.0
>
> OS: Windows Server 2012 R2
>
>
>
> BR, Oleksandr
>
>
>
>
>
>
>