You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by John Smith <ja...@gmail.com> on 2020/11/17 17:44:14 UTC

What is considered high IO wait and partition exchange failure?

So if I understand correctly the logs below...

The node that shut off was timing out trying to get partition exchange from
the indicated nodes and it shut itself off correct? Does this mean this
node was also the master?

1- The time indicated in the log is that UTC?
2- I'm trying to see if it was high IO, but the node that stopped had no
high IO.
3- the other nodes had an average peek of about 0.3%-0.5% And that was
around 16:30 EST So not sure if those times match with the log.
4- On super high load days we get up to 7%-9% IO Wait but we never lost a
node on those occasions.
5- The nodes are persistent nodes so I'm guessing on write there a high IO
can we reduce the IO, but even then I'm not sure it's related.

[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=132, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e5645874-77e9-4455-8db5-07fa63984276, addrs=[127.0.0.1, 172.17.0.5],
sockAddrs=[/172.17.0.5:0, /127.0.0.1:0], discPort=0, order=108,
intOrder=60, lastExchangeTime=1600798045116, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=133,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xxxxxx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=133, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=13f473cb-5441-460b-9b9f-a23c1bcf3b0b, addrs=[127.0.0.1, 172.17.0.3],
sockAddrs=[/127.0.0.1:0, /172.17.0.3:0], discPort=0, order=110,
intOrder=61, lastExchangeTime=1600798045127, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=134,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xxxxxx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=134, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=ef9f3750-edfa-474e-8e68-6f0abee5095a, addrs=[127.0.0.1, 172.17.0.10],
sockAddrs=[/127.0.0.1:0, /172.17.0.10:0], discPort=0, order=112,
intOrder=62, lastExchangeTime=1600798045137, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=135,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xxxxxx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=135, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e14e5bea-2784-4a6f-af2f-e98a4bf0ab52, addrs=[127.0.0.1,
xxx.xxx.xxx.63], sockAddrs=[xxxxxx-0001/xxx.xxx.xxx.63:47500, /
127.0.0.1:47500], discPort=47500, order=118, intOrder=65,
lastExchangeTime=1600798763474, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=136,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xxxxxx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=136, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=8facbe9c-6d9f-4c2d-8d9c-5c7f38dbb7da, addrs=[127.0.0.1,
xxx.xxx.xxx.69], sockAddrs=[/127.0.0.1:47500,
xxxxxx-0003/xxx.xxx.xxx.69:47500], discPort=47500, order=120, intOrder=66,
lastExchangeTime=1600799256509, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,796][INFO][db-checkpoint-thread-#56%xxxxxx%][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms,
checkpointLockHoldTime=14ms, reason='timeout']
[04:36:07,832][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=carrier-ids-for-phones]
[04:36:07,839][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=twofa-otp]
[04:36:07,840][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=SQL_PUBLIC_CARRIER_CODE]
[04:36:07,841][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=carrier-lookups]
[04:36:07,842][INFO][node-stopper][GridCacheProcessor] Stopped cache
[cacheName=ignite-sys-cache]
[04:36:08,497][INFO][node-stopper][IgniteKernal%xxxxxx]

>>>
+---------------------------------------------------------------------------------+
>>> Ignite ver.
2.7.0#20181130-sha1:256ae4012cb143b4855b598b740a6f3499ead4db stopped OK
>>>
+---------------------------------------------------------------------------------+
>>> Ignite instance name: xxxxxx
>>> Grid uptime: 52 days, 10:25:51.029